Introduction
An interpreter executes the program without first translating the program down to executable machine code. Most interpreters these days (Java, C#, ruby, python, ...) have "compilers" that translate the programs to bytecodes (part of the way to machine code). The interpreters for these languages are bytecode interpreters as you saw in the previous lecture.
Tree-based interpreters, on the other hand, do not have a translation phase at all. The interpreter is blended into the parser and semantic analyzer. In a sense, these tree-based interpreters are more difficult to build because they must do more at once. They are not broken down into easier pieces.
The first phase of a tree-based interpreter is a parser that builds a suitable AST. Symbol table management can be done either in the parser or in a separate tree walking phase. Depending on the language semantics, you may also have to perform type analysis. After semantic analysis and symbol table management, execution occurs in another tree walk. As mentioned in the interpreter overview, the goal is to execute a function according to subtree patterns in the tree. For example, you might have a rule in your tree grammar that looks like:
assign
: ^('=' ID rhs=expr) {interp.assign($ID.text,$rhs.value);}
;
where interp points to a run-time object containing most of the interpreter's functionality. You want to keep the grammar as clean as possible by having your action simply invoke methods in the interpreter object rather than having large swaths of code in a grammar. Sometimes the actions are simple. For example, here is the start of a statement rule:
stat : ^('print' expr) {System.out.println($expr.value);}
| ...
;
Many statements in a programming language can be executed in this way: "match this, execute this." The tree walker proceeds in a straightforward depth first manner. But, what happens with loops? What about method calls? In other words, how do you implement functionality where the interpreter must branch to another part of the program?
Control-flow in a tree-base interpeter
At first, using a tree grammar to jump around the trees seems to contradict the idea of using a tree grammar because of its sequential walk of the tree. Fortunately, we can simply tell the tree parser to seek a particular location of the tree and continue parsing. For function calls, we can even tell the tree parser to pop back to the previous location the tree and continue parsing. Just as with a bytecode interpreter or even machine code, jumping around is simply a matter of resetting the "instruction pointer" to the next location to execute. Jumping around a two-dimensional tree is only slightly more complicated than jumping around a linear bytecode sequence.
The follow tree grammar rule recognizes an IF statement and executes the statement only if the condition expression is true. If it's false, the ELSE clause is executed if it exists.
ifstat
: ^('if' c=expr s=. e=.?) // ^('if' expr stat stat?)
{
int next = input.index();
if ( ((Boolean)$c.value).booleanValue() ) {
input.seek($s.start.streamIndex);
stat();
}
else if ( $e!=null ) {
input.seek($e.start.streamIndex);
stat();
}
input.seek(next);
}
;
Naturally, it's cleaner to put that action into the interpreter object and have the grammar make a method call instead of that big action.
What about a loop? It's not much harder. In this case, we need to repeatedly evaluate the condition, not just once like the IF statement. Here is how to implement the WHILE:
whilestat
: ^('while' c=. s=.) // ^('while' expr stat)
{
int next = input.index(); // index of node following WHILE
input.seek($c.start.streamIndex);
Object condition = expr();
while ( ((Boolean)condition).booleanValue() ) {
input.seek($s.start.streamIndex);
stat();
input.seek($c.start.streamIndex);
condition = expr();
}
input.seek(next);
}
;
And, finally, method calls are identical except that we need to push/pop the stream index. We also need to look up the target method's stream index from the symbol table since all we have is a name--we don't have the target method definition's AST node address.
Strategy
Expr.g parser that builds trees and populates symbol table. Makes tree nodes associated with scopes point at the symbol table scope created (class node, BLOCK for methods, block statements); this will be needed during the second pass to figure out what the current scope is. FieldSymbol.definition points at the AST because we will need to execute these assignments when we create an instance of the surrounding class. Automatically insert "this." into tree ^('.' this x) if x is a field reference. Differentiate between x=3 and o.x because we need to set the value of x not get its value (we get the value of x if it is an expression).
Eval.g tree grammar that executes code
Defining symbols
Start out with global scope as current scope. upon entry of '{' of class definition, create a new ClassSymbol and set that as the current scope. pop the scope after '}' of class definition.
Define a field for all assignments within class definitions.
6 Comments
Hide/Show CommentsMar 11, 2009
Anonymous
These examples are not working, no streamIndex member
Mar 26, 2009
Anonymous
What about the presence of "unconditional jumps" like 'goto label' statements inside then/else or cycle blocks? Is there any strategy how to correctly seek to any possible node inside the nodestream, without breaking rule/tree matching?
Thanks.
Giovanni
Jun 01, 2009
Anonymous
Its not working with 3.1.3 -> there is no streamIndex member. However, I was wondering why there is absolutely no usable information on how to process if-then-else statements within the AST. I mean it is one of the most common language constructs...
Jun 01, 2009
Frank Du
The streamIndex() method is gone. Please refer to the comment section at
http://www.antlr.org/wiki/display/ANTLR3/Simple tree-based interpeter
Please use CommonTreeNodeStream.index() method instead, plus antlr-v3.1.2.
Apr 13, 2010
Anonymous
I've worked with Antlr for a few days now and I think it's really beautiful. I've started using Antlr towards the goal of writing an interpreter. I've worked my way through all the basic concepts so that now I have an AST, which works fine. BUT: As has already been mentioned in one of the above posts, information about how to implement conditional and loop constructs is so sparse it's unbelievable. I've browsed the Antlr reference book, but there seems to be nothing there. I've browsed the internet for hours now and found almost nothing. Man, this is really annoying.
May 09, 2010
Anonymous
Try chapter 25 of the language implementations pattern book by parr.