[stringtemplate-interest] ST 4.0 planning

Sterling Clover s.clover at gmail.com
Mon Sep 8 20:37:18 PDT 2008


My HStringTemplate code is unfortunately probably pretty unreadable  
for folks that aren't Haskell coders, but the architecture that I use  
would translate pretty cleanly to Java.

In any case, the notion is that it compiles down to a "network of  
closures" (a la lisp's regex engine). A template is a collection of a  
context (which is scoped parameters for separator, etc and most  
importantly an an environment [which we treat as immutable, and  
which, naturally, in haskell, really is]), and a function which given  
the context produces a result. The entire template is polymorphic  
over different output types (in the Java world this would, I suppose,  
be writers, such as a pretty-printing writer and a normal one). The  
env is a map of strings to objects, all of which are capable of being  
transformed into either a submap of strings to objects, or a list of  
objects, or a simple string. Laziness lets me abstract this away more  
cleanly, but it can be done reasonably without it. Contexts also have  
a notion of an environment for template lookup as well -- i.e., a  
"group", which is a map of strings to templates. Groups can be  
composed by algebraic operations (they are monoidal) such that they  
have a left biased "sum" -- i.e. envA + envB = first try to look it  
up in A, and failing that, in B. This gives a notion of scopes for  
supergroups and soforth. When a template is invoked, it is looked up  
in the composed group of the current enclosing context, and there is  
logic for choosing what gets overriden and what does not (i.e. envs  
become localEnv + superEnv, encoders are *not* overridden, separators  
are, etc.) This keeps all the scope management stuff in essentially  
one place and easy to reason about because we're adhering to strict  
algebraic properties about how we use it.

The "compilation" step in HStringTemplate fuses parsing and  
compilation, which in retrospect was a mistake -- I would have much  
preferred to have a syntax tree to keep things easier to reason  
about. In any case, the most difficult bit to implement was the  
correct logic for fancy interleaving multiple traversals and such.  
But outside of that, everything else translates to one of a few very  
straightforward calls.

To model the network of closures idea in Java, instead of  
interpreting the tree directly, you would essentially be "staging"  
the execution, traversing it once to produce a new tree of function- 
objects whose .call() method would directly return the result, and  
which would explicitly thread a context object containing all scoped  
information between them. By using that context object in an  
immutable fashion (i.e., so to speak, copy-on-modify) then all the  
issues with corruption between different scopes would disappear "for  
free." While you won't get the same multiple backend ability (which I  
think would actually be very difficult and dodgy in any case), I tend  
to think that performance should actually be near that of directly  
generated code given the JIT abilities of the runtime and given that  
you're probably going to be writing fairly simple code in your target  
language rather than performing many fancy static optimization passes.

Incidentally, from what I've seen of bug reports on this list, the  
biggest problem seems to be that iterators get consumed in odd ways?  
So maybe that could be simplified as well by moving away from  
iterators to an explicit enumerator based model where advance() must  
be called as a distinct step? And, of course, by creating fresh  
iterators or enumerators from source objects as necessary rather than  
storing them.

I hope some of this note is comprehensible and useful.

Regards,
Sterl

P.S. in any case, for reference, the code is here: http:// 
hackage.haskell.org/packages/archive/HStringTemplate/0.4/doc/html/src/

The bulk of the heavy lifting is in the Base file, but the Classes  
and Groups files are also pretty important.

On Sep 8, 2008, at 1:48 PM, Terence Parr wrote:

> Dear ST-o-philes and related humans,
>
> I am starting the planning stages for ST 4.0. I begin my sabbatical
> next May, a few months after I finish this current book. I plan on
> writing software like crazy (for 15 months!). This will include
> optimizing ANTLR and hopefully converting ANTLR and ST to be ANTLR v3
> clean; no more ANTLR v2 requirement.
>
> As part of converting ST to v3 grammars, I took a look at updating
> things.  As I looked through all the complicated code that manages
> dynamic scopes, parameters, and nested templates I realized that there
> is a lot of stuff going on in there.  ST groove organically from a
> simple string with holes in it to a sophisticated tree-based
> interpreter. Tree-based Interpreters are much more difficult to build
> then, say, a byte code interpreter. Further, debugging ST stuff is
> quite difficult because all you have are objects and you have to chase
> a lot of pointers through hash tables and so on to figure out what is
> going on.  There is no code to step through related to your templates.
>
> I am contemplating moving to a JSP-like model where I generate Java
> (or C# or Python, ...) instead of doing an interpreter. There are a
> number of advantages:
>
> 1. In principle, we could use the rechargeable architecture pattern of
> ANTLR to generate whatever source code we want; C++ and so on. the
> only requirement would be some sort of reflection still because I
> don't want attributes to be typed in ST. That means that you'd need
> RTTI for C++, which it supposedly has now.
>
> 2.  I would suspect that the templates would go much faster when
> executing "natively" in Java.
>
> 3. You could debug templates by stepping through them just like you do
> ANTLR parsers. Templates would translate to Java methods. Groups would
> translate to objects.  Like JSP, we could automatically compile things
> in the background. This means that they would go slow the first time
> you ran the template. Also, I would have to investigate a custom class
> loader so that I could unload templates.
>
> I'm planning on breaking with absolute backward compatibility to fix a
> number of design flaws that came about because requirements changed
> during the last eight years.
>
> So, it is a bit premature, but I like to have things to think about
> while I'm waiting in line etc...
>
> The idea of generating Java code is growing on me. Note that it would
> only be generating Java or stay an interpreter. I would not do both.
> Those are two totally separate products almost in terms of
> implementation.
>
> Ter
> _______________________________________________
> stringtemplate-interest mailing list
> stringtemplate-interest at antlr.org
> http://www.antlr.org:8080/mailman/listinfo/stringtemplate-interest



More information about the stringtemplate-interest mailing list