Expression evaluator

Skip to end of metadata
Go to start of metadata

Here is a complete grammar that evaluates expressions containing +, -, * and variable assignments. A simple hash table is used to store variable values.

The tutorial requires the current depot version of ANTLR 3.0, i.e. the version must be later than 3.0 beta 5.

grammar Expr;

@header {
import java.util.HashMap;
}

@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}

prog:   stat+ ;
                
stat:   expr NEWLINE {System.out.println($expr.value);}
    |   ID '=' expr NEWLINE
        {memory.put($ID.text, new Integer($expr.value));}
    |   NEWLINE
    ;

expr returns [int value]
    :   e=multExpr {$value = $e.value;}
        (   '+' e=multExpr {$value += $e.value;}
        |   '-' e=multExpr {$value -= $e.value;}
        )*
    ;

multExpr returns [int value]
    :   e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
    ; 

atom returns [int value]
    :   INT {$value = Integer.parseInt($INT.text);}
    |   ID
        {
        Integer v = (Integer)memory.get($ID.text);
        if ( v!=null ) $value = v.intValue();
        else System.err.println("undefined variable "+$ID.text);
        }
    |   '(' expr ')' {$value = $expr.value;}
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ {skip();} ;

And here is the main program to execute the parser:

Run ANTLR on the grammar and compile:

Here is a sample run:

$java Test
x=1
y=2
3*(x+y)
<EOF>
9
$ 

What if you want to build trees instead? Here is a grammar that builds trees instead of evaluating the expressions:

grammar Expr;
options {
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog:   ( stat {System.out.println($stat.tree.toStringTree());} )+ ;

stat:   expr NEWLINE        -> expr
    |   ID '=' expr NEWLINE -> ^('=' ID expr)
    |   NEWLINE             ->
    ;

expr:   multExpr (('+'^|'-'^) multExpr)*
    ; 

multExpr
    :   atom ('*'^ atom)*
    ; 

atom:   INT 
    |   ID
    |   '('! expr ')'!
    ;

ID  :   ('a'..'z'|'A'..'Z')+ ;
INT :   '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS  :   (' '|'\t')+ {skip();} ;

And here is the tree grammar that will read the trees and evaluate the expressions.

tree grammar Eval;

options {
    tokenVocab=Expr;
    ASTLabelType=CommonTree;
}

@header {
import java.util.HashMap;
}

@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}

prog:   stat+ ;

stat:   expr
        {System.out.println($expr.value);}
    |   ^('=' ID expr)
        {memory.put($ID.text, new Integer($expr.value));}
    ;

expr returns [int value]
    :   ^('+' a=expr b=expr)  {$value = a+b;}
    |   ^('-' a=expr b=expr)  {$value = a-b;}   
    |   ^('*' a=expr b=expr)  {$value = a*b;}
    |   ID 
        {
        Integer v = (Integer)memory.get($ID.text);
        if ( v!=null ) $value = v.intValue();
        else System.err.println("undefined variable "+$ID.text);
        }
    |   INT                   {$value = Integer.parseInt($INT.text);}
    ;

Here is the main program:

and a sample run:

$ java Test
x=1
y=2
3*(x+y)
<EOF>
(= x 1)
(= y 2)
(* 3 (+ x y))
9
$ 
Labels:
  1. Nov 26, 2006

    On generating the EvalTreeParser.java I get 

     ....  value = Integer.parseInt($INT.text);

     within the expr() method. Can you tell me what I'm doing wrong.

    Thanks, a lot!

     Matthias

  2. Nov 29, 2006

    Both $INT.text and $expr.value need to use labels, like:

    e=expr ... {$e.value}

    and

    i=INT ... {$i.text}

    I updated the example accordingly.

  3. Nov 29, 2006

    The issue with $INT.text is fixed in the depot and will be part of the next beta release.
    You still need the label for expr, though.

  4. Nov 29, 2006

    Disregard my comments. Both issues are fixed in the latest ANTLR depot version, it requires any version later than 3.0 beta 5 to run.

  5. Aug 03, 2007

    I tried the example given above.  Everything works as described but I find it confusing that I have to enter <EOF>
    before I get the first line of output.  After all, that is not what an interpreter is about.  I would have expected that every time I enter an expression followed by a newline the corresponding value is printed.  I would appreciate any hint how this can be done.

  6. Aug 03, 2007

    Hi. CommonTokenStream buffers all tokens up first...need an unbuffered version. I don't think we have an official one yet...er...maybe we do.

  7. Aug 06, 2007

    I wrote a version of the grammar that is smart enough to know that line breaks after operators don't end the line, but I had a problem treating the EOF the same way as a line break, since it's seen as repeating endlessly. Just replacing NEWLINE with NEWLINE|EOF gives you an infinite number of empty statements.

    I do have a solution, but it's a cludge... Here is the file, can you see a better way? Note, it also has a few other additions I put in to help me learn more about java and ANTLR. I appreciate any comments about style or ways of implementing.

    The way I handled the EOF problem was to create different separators to end lines, one set using (possibly) repeated NEWLINES and another using EOF.

    So the grammar starts with:

    prog : stat (eb stat)* ef;

    where 'eb' is the NEWLINE based separator and ef is the EOF based separator.

    Another question I have relates to the canonical comment eater:

    bcomment: '#' ( options{greedy=false;}: . )* NEWLINE+ ;

    Is there is any problem with breaking out the stop character out of the comment eater? Does it slow the parser down with to do this:

    eof_separator: comment ? EOF;

    separator: comment ? NEWLINE+;

    comment: '#' ( options{greedy=false;}: . )* ;
    Josh

    Any, here is a slightly more complicated calculator:

    ...
    How do I get that nice box around the code?
    @header {
      package test;
      import java.util.HashMap;
    }

    @lexer::header {package test;}

    @members {
      public enum Types { UNDEFINED,INT, DOUBLE}
       public class Value {
       Types type;
      int intval;
       double doubleval;
       void Set(int i){type=Types.INT; intval=i;}
       void Set(double d){type=Types.DOUBLE; doubleval=d;}
       void Set(Value v){
            if (v.type==Types.INT) Set(v.intval);
            else if (v.type==Types.DOUBLE) Set(v.doubleval);
            else type=v.type;
       }
       boolean IsInexact() { return type == Types.DOUBLE; }

       Value(int i){Set(i);}
       Value(double d){Set(d);}
       Value(Value v){Set(v);}
       Value(){type=Types.UNDEFINED;}
       double GetDouble(){
            if (type==Types.DOUBLE) return doubleval;
            else if (type==Types.INT) return (double)intval;
            System.err.println("dereferenced uninitialized value");
            return -1.0;
       }
       int GetInt(){
            if (type==Types.DOUBLE) return (int)doubleval;
            else if (type==Types.INT) return intval;
            System.err.println("dereferenced uninitialized value");
            return -1;
       }
       void MulSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()*o.GetDouble()); else Set(GetInt()*o.GetInt()); }
       void AddSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()+o.GetDouble()); else Set(GetInt()+o.GetInt()); }
       void SubSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()-o.GetDouble()); else Set(GetInt()-o.GetInt()); }
       void Negate() {
            if (type==Types.DOUBLE) doubleval=-doubleval;
            else if (type==Types.INT) intval=-intval;
       }
       void Display()
       {
            if (type==Types.DOUBLE) System.out.println(doubleval);
            else if (type==Types.INT) System.out.println(intval);
            else System.out.println("undefined type");
       }
     }
    /** Map variable name to Integer object holding value */
    HashMap memory = new HashMap();
    }

    prog : stat (eb stat)* ef;

    stat: expr {$expr.value.Display();}
        | ID '=' b expr
            {memory.put($ID.text, $expr.value);}
        |
        ;

    expr returns [Value value]
        : v=multExpr {$value = $v.value;}
            ( '+' b v=multExpr {$value.AddSet($v.value);}
            | '-' b v=multExpr {$value.SubSet($v.value);}
            )*
        ;

    multExpr returns [Value value]
        : v=negExpr {$value = $v.value;} ('*' b v=negExpr {$value.MulSet($v.value);})*
        ;

    negExpr returns [Value value]
        : negParity atom {$value = $atom.value; if ($negParity.neg) value.Negate(); }
        ;

    negParity returns [boolean neg]
        : { $neg=false;}
        | '-' b p=negParity { $neg = !$p.neg;}
        | '+' b p=negParity {$neg =$p.neg;}
        ;

    atom returns [Value value]
        : INT {$value = new Value(Integer.parseInt($INT.text));}
        | DOUBLE {$value = new Value(Double.parseDouble($DOUBLE.text));}
        | ID
        | '(' v=expr ')' {$value = $v.value;}
        ;

    eb : bnl
        | ';' b ;

    ef : fnl
        | ';' EOF ;

    b : bnl ?;

    fnl : EOF
        | fcomment ;

    bnl :NEWLINE+
        | bcomment ;

    fcomment: '#' ( options {greedy=false;} : . )* EOF ;

    bcomment: '#' ( options {greedy=false;} : . )* NEWLINE+ ;

    ID : ('a'..'z'|'A'..'Z'|'')('a'..'z'|'A'..'Z'|'0'..'9'|'')* ;
    DOUBLE : '0'..'9'* '.' '0'..'9'(('e'|'E')(''|'-')?'0'..'9'+)?
    | '0'..'9'('e'|'E')(''|'-')?'0'..'9'+ ;
    INT : '0'..'9'+ ;
    SKIPNL :( '\\\\' (' '|'\t')* '\r'? '\n') {skip(); };
    WS : (' '|'\t')+ {skip();} ;
    NEWLINE:'\r'? '\n' ;
    SYMBOL : ':' ('a'..'z'|'A'..'Z')+;

  8. Aug 06, 2007

    Oops the DOUBLE got messed up, and I can't edit it (all kinds of formatting and escapes seem to be lost on editing).

    I'll but escapes in front of every character.  That may fix it.

    DOUBLE : '0'..'9'
    * '.'  '0'..'9'+(('e'|'E')('+'|'-')?'0'..'9'+)?
        | '0'..'9'+('e'|'E')('+'|'-')?'0'..'9'
    + ;

  9. Jan 01, 2008

    It is possible to make Eval.g more flexible? If we change order of expression input it raised exception "undefined variable y":

    I think it will be useful if such case processing correctly. If not so, what kind of compiler part can help with this?

  10. Jan 01, 2008

    You would need to do lazy evaluation (google keyword) and compute the data dependencies to ensure values needed by computations are done before they are needed.

  11. Dec 30, 2008

    Hi, Im struggling even to get this example working (sad)

    Im on Ubuntu Linux and running the antlr works 1.2.2 jar from the command line, java 1.6.0_07 and Antlr 3.1.1

    I paste the example grammar into the window, then set the options such as 'prog' etc

    When I click the black play button, i get an error saying this feature isnt designed forsuch and such (I asked it to stop showing the error, so I dont have the exact message).

    In the console window, there is a java error

    [22:02:16] java.lang.NullPointerException
        at org.antlr.xjlib.foundation.XJUtils.concatPath(Unknown Source)
        at org.antlr.xjlib.foundation.XJUtils.concatPath(Unknown Source)
        at org.antlr.works.components.editor.ComponentEditorGrammar.getOutputPath(Unknown Source)
        at org.antlr.works.components.editor.ComponentEditorGrammar.getANTLRTool(Unknown Source)
        at org.antlr.works.grammar.engine.GrammarEngineImpl.getANTLRTool(Unknown Source)
        at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createNewGrammar(Unknown Source)
        at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createParserGrammar(Unknown Source)
        at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createCombinedGrammar(Unknown Source)
        at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createGrammars(Unknown Source)
        at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.analyze(Unknown Source)
        at org.antlr.works.grammar.engine.GrammarEngineImpl.analyze(Unknown Source)
        at org.antlr.works.interpreter.EditorInterpreter.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:619)

    Do you have any advice please?

    Thanks

    Paul

  12. Feb 20, 2009

    jd

    The NullPointerException problem can arise if ANTLRWorks is unable to find the java compiler (javac.exe on windows).

    You can fix this by going to File > Preferences > Compiler and entering the path for the executable.

  13. Feb 20, 2009

    Hi,

    I am still trying to understand how actions and return types are handled within ANTLR3, specifically the C implementation.

    Looking at the example above, there are two main things I don't fully understand.

    Question 1) How does ANTLR handle variables, if at all? If it doesn't, is this the whole purpose of the hashmap in the above example?

    Let me illustrate my question below. Let's say we wish to implement a simple calculator.g lex/parser. The example above might not be exactly the same implementation but let's pretend that it understands the following input syntax.

    a=5;

    b=7;

    c=a+b+3;

    result=c*10;

    When performing the addition on line 3, how does the ANTLR grammar know that a = 5 and b=7 from several lines up in our input file? Do we need to implement some type of stack or does ANTLR inherently provide this mechanism as part of the parser?

    Question 2) How can I return more complex objects using the ANTLR3 C implementation?

    In the following parser snippet, we see that the action for the parser rule is able to return a variable of type 'integer'. This variable can then be passed up the parser call stack. What if I wanted to return a C++ object instead of a simple integer type?

    Instead of this simple integer return type

    multExpr returns [int value]
        :   e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
        ;

    Let's say we want to implement a grammer rule that can return an object

    { TimeSeries myTS(..); }

    addExpr return[TimeSeries retval]

        :  'add' '(' expr ',' NUMBER ')'   { $retval = myTS.add($NUMBER.text->chars) }

    Where I have a C++ Class the defines some like

    class TimeSeries
    {
    public:
       TimeSeries(...);   
       TimeSeries::add(int number);
    private:  
       RWCString name;  
       std::vector<int> values;;  
       std::vector<RWDate> dates;;
    }

    Is this possible? If so, how?
    Thanks

    -Tom

  14. Feb 24, 2009

    I'm new to the Java world and was pleasantly surprised to discover ANTLR which I'm hoping will save me a lot of time. However, I was very discouraged after downloading ANTLRWorks, pasting in the first example (the Expr at the top of this page) and then getting the error message

       Cannot display rule "prog" because start state not found

    I get this message (with a different rule name) no matter which rule I click on.

    (Update --- I quit ANTLRWorks, restarted it and loaded in the Expr.g file again, this time after clicking on a rule, the syntax diagram was displayed fine. So there's some funky initialization stuff that's not being handled properly)

  15. Mar 04, 2009

    The evaluator works with positive numbers only, correct? How would you change the grammar to be able to enter negative numbers too? I'm new to ANTLR, and my streightforward attempt to add optional minus to the INT definition has failed -- an expression like "6-2" is now considered "6" and "-2" tokens, not -(6,2) as I want.

    P.S. Found the solution here: http://www.antlr.org/pipermail/antlr-interest/2001-October/000106.html

  16. Mar 14, 2009

    Terence, did you ever confirm whether there exists an unbuffered TokenStream?  I'd like to adapt this example to parse the output from an embedded system. The data coming from the embedded side would always be composed of single lines.

    (I'm planning to use the C# runtime, so hopefully, if an unbuffered TokenStream exists, it has been ported to the C# runtime.)

    Thanks!

    Greg

  17. Mar 14, 2009

    Hi! Well, i made it work for tree node streams by created a LookaheadStream in new 3.1.2 ANTLR. I haven't gotten around to making CommonTokenStream nonbuffering by default. It's a matter of subclassing LookaheadStream etc...

  18. Mar 14, 2009

    Good deal.  I did a cursory search through the source tar for 3.1.2 and couldn't find a LookaheadStream.  Am I looking in the right place?  I thought I'd see what I could do implement the changes needed to the C# version of CommonTokenStream.

    One thing occurred to me earlier:   An intermediate, and possibly bettter implementation would be to simply include a 'flush' method.  This lets the client decide when it's "right" to parse the input stream, while leaving most, if not all the other framework intact.  This lets everything that may be set up by CommonTokenStream in place, but without having to immediately throw every token at the parser.  (I apologize if this is a naive suggestion -- I still need to take a look at the source)

    Thanks again!

    Greg

  19. Mar 15, 2009

    I tried looking in 3.1.3-SNAPSHOT/antlr-3.1.3-20090315.194241-14-sources.jar, but only found LookaheadSet.

  20. Mar 15, 2009

    That's the ticket!  Thanks!

  21. Mar 24, 2009

    Hi Terence,

    Well, I was able to successfully create a class called CommonTokenLookaheadStream based on the LookaheadStream and FastQueue.  (I used excerpts from the CSharp3 runtime source (3.1.3) compiled against the CSharp2 runtime (using 3.1.2).  Evidently, the CSharp3 runtime and templates are bit out sync right now.

    Unfortunately, from my survey, it appears that nearly everything in the runtime is written to completely fill buffers at initialization -- something I can't use for my application.  I need to monitor an on-going byte stream and pick off recognized elements as they pass by.  From my reading of the (Reference) book, the site, and the code, it appears I need a fuzzy- / filter-grammer operating on an infinite stream.  Of course the problem is the infinite stream.

    I did try writing a class that implements ICharStream using an adaptation of LookaheadStream (that can handle C# value types), but I don't think I have a clear enough understanding of the ICharStream functional requirements.  Everything "works" until a token completes.  At that point something fails while trying to generate the token text for processing in a production.  e.g. I can type an 'INT' (using the console as a character source), but as soon as the parser tries to Match() it, things fail in FastQueue or I get an empty string.

    Before I seriously dig into making my ICharStream class work, I thought I'd better ask a couple of questions:

    1) Is there a non-buffering character stream already available?

    2) Is there any descriptive documentation of the functional requirements imposed on ICharStream.  There are definately hints in the doxygen docs, but to me, these seem to more like implementation notes, since I just don't have a good feel for client usage.

    Any advice would be greatly appreciated!

    Thanks!

    Greg

  22. Mar 24, 2009

    No nonbuffering char streams yet...yeah, that sucks. it's the reason I didn't just "fix" it for 3.1.3. CharStream in Java side should explain functionality. sorry for the hassle.

  23. Mar 24, 2009

    Okay.  Time to go digging.

    Given that no alternative exists, I think making the LookaheadStream (value type variant) and FastQueue are probably a good approach.  Of course, if I get something going, I'll be happy to share!  I just can't believe I'm the first to want to do this.  Antlr is perfect for what I need to do.

    I may need some help understanding how FastQueue infers it can bounce items.

    [Edit]:  Is there documentation about when Antlr uses markers?  I'm going to go see if there is anything in the Reference.  This seems key to memory management for an infinite stream.

  24. Mar 24, 2009

    I'll help; no problem. People have wanted unbuffered IO, they just cuss my name (wink)

  25. Mar 24, 2009

    I just looked at CharStream in Hudson.  Sam propagated the comments from CharStream to C#:ICharStream.  The comments for method substring() indicate it's not going to be used for infinite streams, but in fact, that seems to be the very place where I am having trouble.  I need to dig into Match() to see how this is happening. (Being lazy...) Are the 3.1.2 CSharp2 runtime sources available?

    Any thoughts on the use of substring()?

  26. May 24, 2009

    Warning: the version of the expression grammar that appears in "The Definitive ANTLR Reference" has an incorrect WS rule.

  27. Aug 28, 2009

    Hi,i am new to all this and have successfully installed the plug in for ANTLR the ANTLR IDE tool.I create the grammar and the Test.java and when i run it the console expects me the input where i put the above you have in the first case but instead of showing something at the end it insists on waiting input?whats going wrong?can anyone help,i would be gratefull...thanks!

  28. Aug 28, 2009

    Did you hit "end of file" which is either control-D or ctrl-Z depending on your platform?

  29. Aug 29, 2009

    thanks very much for your answer but i have just exceeded my problem with these lines instead:
    String input = "Some string";
    CharStream cs = new ANTLRStringStream(input);
    ExprLexer lexer = new ExprLexer(cs);

    control-D or ctrl-Z did nothing and thats was the problem for me..

  30. Feb 09, 2011

    I would like to convert this example to the C target. What is the best way to handle the memory HashMap with C the target equivalent in the @members section? 

  31. Feb 17, 2011

    Would be nice to have this added to the example. Without this line I failed to compile it.

  32. Feb 17, 2011

    Why package? There's no package mentioned above is there?

  33. Feb 17, 2011

    Right, my fault. But maybe we should mention it? Otherwise it takes some to find a way to add a header to lexer. Well, at least I had to spend some time to understand that @lexer:header is the way to do it. Maybe we can extend the example and mention some com.acme package everywhere?

  34. Sep 06, 2011

    I agree with Yegor. I made just a small change to the example - moved it to a subfolder named "grammars" - and it stopped working.

    I solved this problem with a simple change to Expr.g:

    grammar Expr;

    @header {
    package grammars;
    import java.util.HashMap;
    }

    @lexer::header {
    package grammars;
    }

    ....

  35. Sep 21, 2011

    This grammer (Expr.g) gives an error when the input is 1-1<NEWLINE> (no spaces anywhere) The error is MismatchedTokenException(12!=6).

    The input 1+1<NEWLINE> works (no errors). I run this grammer in AntlrWorks, but when I run it in debug mode, then the exprtession 1-1<NEWLINE> gives NO error.

    (AntlrWorks 1.4.3, Antlr 3.4, StringTemplate 3.2.1, Java 1.6.0_26 (Sun Microsystems Inc.))

    Update: Antlr 3.1.1 gives no errors!

    Thanks