Dashboard > ANTLR 3 > ... > Examples > Expression evaluator
  ANTLR 3 Log In | Sign Up   View a printable version of the current page.  
  Expression evaluator
Added by Terence Parr, last edited by Terence Parr on May 05, 2007  (view change) show comment
Labels: 
(None)

Here is a complete grammar that evaluates expressions containing +, -, * and variable assignments. A simple hash table is used to store variable values.

The tutorial requires the current depot version of ANTLR 3.0, i.e. the version must be later than 3.0 beta 5.

grammar Expr;

@header {
import java.util.HashMap;
}

@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}

prog:   stat+ ;
                
stat:   expr NEWLINE {System.out.println($expr.value);}
    |   ID '=' expr NEWLINE
        {memory.put($ID.text, new Integer($expr.value));}
    |   NEWLINE
    ;

expr returns [int value]
    :   e=multExpr {$value = $e.value;}
        (   '+' e=multExpr {$value += $e.value;}
        |   '-' e=multExpr {$value -= $e.value;}
        )*
    ;

multExpr returns [int value]
    :   e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
    ; 

atom returns [int value]
    :   INT {$value = Integer.parseInt($INT.text);}
    |   ID
        {
        Integer v = (Integer)memory.get($ID.text);
        if ( v!=null ) $value = v.intValue();
        else System.err.println("undefined variable "+$ID.text);
        }
    |   '(' expr ')' {$value = $expr.value;}
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ {skip();} ;

And here is the main program to execute the parser:

import org.antlr.runtime.*;

public class Test {
    public static void main(String[] args) throws Exception {
        ANTLRInputStream input = new ANTLRInputStream(System.in);
        ExprLexer lexer = new ExprLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ExprParser parser = new ExprParser(tokens);
        parser.prog();
    }
}

Run ANTLR on the grammar and compile:

java org.antlr.Tool Expr.g
javac Test.java ExprLexer.java ExprParser.java

Here is a sample run:

$java Test
x=1
y=2
3*(x+y)
<EOF>
9
$ 

What if you want to build trees instead? Here is a grammar that builds trees instead of evaluating the expressions:

grammar Expr;
options {
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog:   ( stat {System.out.println($stat.tree.toStringTree());} )+ ;

stat:   expr NEWLINE        -> expr
    |   ID '=' expr NEWLINE -> ^('=' ID expr)
    |   NEWLINE             ->
    ;

expr:   multExpr (('+'^|'-'^) multExpr)*
    ; 

multExpr
    :   atom ('*'^ atom)*
    ; 

atom:   INT 
    |   ID
    |   '('! expr ')'!
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ {skip();} ;

And here is the tree grammar that will read the trees and evaluate the expressions.

tree grammar Eval;

options {
    tokenVocab=Expr;
    ASTLabelType=CommonTree;
}

@header {
import java.util.HashMap;
}

@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}

prog:   stat+ ;

stat:   expr
        {System.out.println($expr.value);}
    |   ^('=' ID expr)
        {memory.put($ID.text, new Integer($expr.value));}
    ;

expr returns [int value]
    :   ^('+' a=expr b=expr)  {$value = a+b;}
    |   ^('-' a=expr b=expr)  {$value = a-b;}   
    |   ^('*' a=expr b=expr)  {$value = a*b;}
    |   ID 
        {
        Integer v = (Integer)memory.get($ID.text);
        if ( v!=null ) $value = v.intValue();
        else System.err.println("undefined variable "+$ID.text);
        }
    |   INT                   {$value = Integer.parseInt($INT.text);}
    ;

Here is the main program:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Test {
    public static void main(String[] args) throws Exception {
        ANTLRInputStream input = new ANTLRInputStream(System.in);
        ExprLexer lexer = new ExprLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ExprParser parser = new ExprParser(tokens);
        ExprParser.prog_return r = parser.prog();

        // walk resulting tree
        CommonTree t = (CommonTree)r.getTree();
        CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
        Eval walker = new Eval(nodes);
        walker.prog();
    }
}

and a sample run:

$ java Test
x=1
y=2
3*(x+y)
<EOF>
(= x 1)
(= y 2)
(* 3 (+ x y))
9
$ 

On generating the EvalTreeParser.java I get 

 ....  value = Integer.parseInt($INT.text);

 within the expr() method. Can you tell me what I'm doing wrong.

Thanks, a lot!

 Matthias

Both $INT.text and $expr.value need to use labels, like:

e=expr ... {$e.value}

and

i=INT ... {$i.text}

I updated the example accordingly.

The issue with $INT.text is fixed in the depot and will be part of the next beta release.
You still need the label for expr, though.

Disregard my comments. Both issues are fixed in the latest ANTLR depot version, it requires any version later than 3.0 beta 5 to run.

I tried the example given above.  Everything works as described but I find it confusing that I have to enter <EOF>
before I get the first line of output.  After all, that is not what an interpreter is about.  I would have expected that every time I enter an expression followed by a newline the corresponding value is printed.  I would appreciate any hint how this can be done.

Hi. CommonTokenStream buffers all tokens up first...need an unbuffered version. I don't think we have an official one yet...er...maybe we do.

I wrote a version of the grammar that is smart enough to know that line breaks after operators don't end the line, but I had a problem treating the EOF the same way as a line break, since it's seen as repeating endlessly. Just replacing NEWLINE with NEWLINE|EOF gives you an infinite number of empty statements.

I do have a solution, but it's a cludge... Here is the file, can you see a better way? Note, it also has a few other additions I put in to help me learn more about java and ANTLR. I appreciate any comments about style or ways of implementing.

The way I handled the EOF problem was to create different separators to end lines, one set using (possibly) repeated NEWLINES and another using EOF.

So the grammar starts with:

prog : stat (eb stat)* ef;

where 'eb' is the NEWLINE based separator and ef is the EOF based separator.

Another question I have relates to the canonical comment eater:

bcomment: '#' ( options{greedy=false;}: . )* NEWLINE+ ;

Is there is any problem with breaking out the stop character out of the comment eater? Does it slow the parser down with to do this:

eof_separator: comment ? EOF;

separator: comment ? NEWLINE+;

comment: '#' ( options{greedy=false;}: . )* ;
Josh

Any, here is a slightly more complicated calculator:

...
How do I get that nice box around the code?
@header {
  package test;
  import java.util.HashMap;
}

@lexer::header {package test;}

@members {
  public enum Types { UNDEFINED,INT, DOUBLE}
   public class Value {
   Types type;
  int intval;
   double doubleval;
   void Set(int i){type=Types.INT; intval=i;}
   void Set(double d){type=Types.DOUBLE; doubleval=d;}
   void Set(Value v){
        if (v.type==Types.INT) Set(v.intval);
        else if (v.type==Types.DOUBLE) Set(v.doubleval);
        else type=v.type;
   }
   boolean IsInexact() { return type == Types.DOUBLE; }

   Value(int i){Set(i);}
   Value(double d){Set(d);}
   Value(Value v){Set(v);}
   Value(){type=Types.UNDEFINED;}
   double GetDouble(){
        if (type==Types.DOUBLE) return doubleval;
        else if (type==Types.INT) return (double)intval;
        System.err.println("dereferenced uninitialized value");
        return -1.0;
   }
   int GetInt(){
        if (type==Types.DOUBLE) return (int)doubleval;
        else if (type==Types.INT) return intval;
        System.err.println("dereferenced uninitialized value");
        return -1;
   }
   void MulSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()*o.GetDouble()); else Set(GetInt()*o.GetInt()); }
   void AddSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()+o.GetDouble()); else Set(GetInt()+o.GetInt()); }
   void SubSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()-o.GetDouble()); else Set(GetInt()-o.GetInt()); }
   void Negate() {
        if (type==Types.DOUBLE) doubleval=-doubleval;
        else if (type==Types.INT) intval=-intval;
   }
   void Display()
   {
        if (type==Types.DOUBLE) System.out.println(doubleval);
        else if (type==Types.INT) System.out.println(intval);
        else System.out.println("undefined type");
   }
 }
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}

prog : stat (eb stat)* ef;

stat: expr {$expr.value.Display();}
    | ID '=' b expr
        {memory.put($ID.text, $expr.value);}
    |
    ;

expr returns [Value value]
    : v=multExpr {$value = $v.value;}
        ( '+' b v=multExpr {$value.AddSet($v.value);}
        | '-' b v=multExpr {$value.SubSet($v.value);}
        )*
    ;

multExpr returns [Value value]
    : v=negExpr {$value = $v.value;} ('*' b v=negExpr {$value.MulSet($v.value);})*
    ;

negExpr returns [Value value]
    : negParity atom {$value = $atom.value; if ($negParity.neg) value.Negate(); }
    ;

negParity returns [boolean neg]
    : { $neg=false;}
    | '-' b p=negParity { $neg = !$p.neg;}
    | '+' b p=negParity {$neg =$p.neg;}
    ;

atom returns [Value value]
    : INT {$value = new Value(Integer.parseInt($INT.text));}
    | DOUBLE {$value = new Value(Double.parseDouble($DOUBLE.text));}
    | ID
    | '(' v=expr ')' {$value = $v.value;}
    ;

eb : bnl
    | ';' b ;

ef : fnl
    | ';' EOF ;

b : bnl ?;

fnl : EOF
    | fcomment ;

bnl :NEWLINE+
    | bcomment ;

fcomment: '#' ( options {greedy=false;} : . )* EOF ;

bcomment: '#' ( options {greedy=false;} : . )* NEWLINE+ ;

ID : ('a'..'z'|'A'..'Z'|'')('a'..'z'|'A'..'Z'|'0'..'9'|'')* ;
DOUBLE : '0'..'9'* '.' '0'..'9'(('e'|'E')(''|'-')?'0'..'9'+)?
| '0'..'9'('e'|'E')(''|'-')?'0'..'9'+ ;
INT : '0'..'9'+ ;
SKIPNL :( '\\\\' (' '|'\t')* '\r'? '\n') {skip(); };
WS : (' '|'\t')+ {skip();} ;
NEWLINE:'\r'? '\n' ;
SYMBOL : ':' ('a'..'z'|'A'..'Z')+;

Posted by Joshua Scholar at Aug 06, 2007 00:04; last updated at Aug 06, 2007 00:43

Oops the DOUBLE got messed up, and I can't edit it (all kinds of formatting and escapes seem to be lost on editing).

I'll but escapes in front of every character.  That may fix it.

DOUBLE : '0'..'9'
* '.'  '0'..'9'+(('e'|'E')('+'|'-')?'0'..'9'+)?
    | '0'..'9'+('e'|'E')('+'|'-')?'0'..'9'
+ ;

It is possible to make Eval.g more flexible? If we change order of expression input it raised exception "undefined variable y":

$ java Test
x=y+2
y=1
x+y
<EOF>
(= x (+ y 2))
(= y 1)
(+ x y)
undefined variable y
3

I think it will be useful if such case processing correctly. If not so, what kind of compiler part can help with this?

You would need to do lazy evaluation (google keyword) and compute the data dependencies to ensure values needed by computations are done before they are needed.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.1 Build:#806 May 06, 2007) - Bug/feature request - Contact Administrators