[stringtemplate-interest] ST 4.0 planning
Sterling Clover
s.clover at gmail.com
Mon Sep 8 20:37:18 PDT 2008
My HStringTemplate code is unfortunately probably pretty unreadable
for folks that aren't Haskell coders, but the architecture that I use
would translate pretty cleanly to Java.
In any case, the notion is that it compiles down to a "network of
closures" (a la lisp's regex engine). A template is a collection of a
context (which is scoped parameters for separator, etc and most
importantly an an environment [which we treat as immutable, and
which, naturally, in haskell, really is]), and a function which given
the context produces a result. The entire template is polymorphic
over different output types (in the Java world this would, I suppose,
be writers, such as a pretty-printing writer and a normal one). The
env is a map of strings to objects, all of which are capable of being
transformed into either a submap of strings to objects, or a list of
objects, or a simple string. Laziness lets me abstract this away more
cleanly, but it can be done reasonably without it. Contexts also have
a notion of an environment for template lookup as well -- i.e., a
"group", which is a map of strings to templates. Groups can be
composed by algebraic operations (they are monoidal) such that they
have a left biased "sum" -- i.e. envA + envB = first try to look it
up in A, and failing that, in B. This gives a notion of scopes for
supergroups and soforth. When a template is invoked, it is looked up
in the composed group of the current enclosing context, and there is
logic for choosing what gets overriden and what does not (i.e. envs
become localEnv + superEnv, encoders are *not* overridden, separators
are, etc.) This keeps all the scope management stuff in essentially
one place and easy to reason about because we're adhering to strict
algebraic properties about how we use it.
The "compilation" step in HStringTemplate fuses parsing and
compilation, which in retrospect was a mistake -- I would have much
preferred to have a syntax tree to keep things easier to reason
about. In any case, the most difficult bit to implement was the
correct logic for fancy interleaving multiple traversals and such.
But outside of that, everything else translates to one of a few very
straightforward calls.
To model the network of closures idea in Java, instead of
interpreting the tree directly, you would essentially be "staging"
the execution, traversing it once to produce a new tree of function-
objects whose .call() method would directly return the result, and
which would explicitly thread a context object containing all scoped
information between them. By using that context object in an
immutable fashion (i.e., so to speak, copy-on-modify) then all the
issues with corruption between different scopes would disappear "for
free." While you won't get the same multiple backend ability (which I
think would actually be very difficult and dodgy in any case), I tend
to think that performance should actually be near that of directly
generated code given the JIT abilities of the runtime and given that
you're probably going to be writing fairly simple code in your target
language rather than performing many fancy static optimization passes.
Incidentally, from what I've seen of bug reports on this list, the
biggest problem seems to be that iterators get consumed in odd ways?
So maybe that could be simplified as well by moving away from
iterators to an explicit enumerator based model where advance() must
be called as a distinct step? And, of course, by creating fresh
iterators or enumerators from source objects as necessary rather than
storing them.
I hope some of this note is comprehensible and useful.
Regards,
Sterl
P.S. in any case, for reference, the code is here: http://
hackage.haskell.org/packages/archive/HStringTemplate/0.4/doc/html/src/
The bulk of the heavy lifting is in the Base file, but the Classes
and Groups files are also pretty important.
On Sep 8, 2008, at 1:48 PM, Terence Parr wrote:
> Dear ST-o-philes and related humans,
>
> I am starting the planning stages for ST 4.0. I begin my sabbatical
> next May, a few months after I finish this current book. I plan on
> writing software like crazy (for 15 months!). This will include
> optimizing ANTLR and hopefully converting ANTLR and ST to be ANTLR v3
> clean; no more ANTLR v2 requirement.
>
> As part of converting ST to v3 grammars, I took a look at updating
> things. As I looked through all the complicated code that manages
> dynamic scopes, parameters, and nested templates I realized that there
> is a lot of stuff going on in there. ST groove organically from a
> simple string with holes in it to a sophisticated tree-based
> interpreter. Tree-based Interpreters are much more difficult to build
> then, say, a byte code interpreter. Further, debugging ST stuff is
> quite difficult because all you have are objects and you have to chase
> a lot of pointers through hash tables and so on to figure out what is
> going on. There is no code to step through related to your templates.
>
> I am contemplating moving to a JSP-like model where I generate Java
> (or C# or Python, ...) instead of doing an interpreter. There are a
> number of advantages:
>
> 1. In principle, we could use the rechargeable architecture pattern of
> ANTLR to generate whatever source code we want; C++ and so on. the
> only requirement would be some sort of reflection still because I
> don't want attributes to be typed in ST. That means that you'd need
> RTTI for C++, which it supposedly has now.
>
> 2. I would suspect that the templates would go much faster when
> executing "natively" in Java.
>
> 3. You could debug templates by stepping through them just like you do
> ANTLR parsers. Templates would translate to Java methods. Groups would
> translate to objects. Like JSP, we could automatically compile things
> in the background. This means that they would go slow the first time
> you ran the template. Also, I would have to investigate a custom class
> loader so that I could unload templates.
>
> I'm planning on breaking with absolute backward compatibility to fix a
> number of design flaws that came about because requirements changed
> during the last eight years.
>
> So, it is a bit premature, but I like to have things to think about
> while I'm waiting in line etc...
>
> The idea of generating Java code is growing on me. Note that it would
> only be generating Java or stay an interpreter. I would not do both.
> Those are two totally separate products almost in terms of
> implementation.
>
> Ter
> _______________________________________________
> stringtemplate-interest mailing list
> stringtemplate-interest at antlr.org
> http://www.antlr.org:8080/mailman/listinfo/stringtemplate-interest
More information about the stringtemplate-interest
mailing list