Here is a complete grammar that evaluates expressions containing +, -, * and variable assignments. A simple hash table is used to store variable values.
The tutorial requires the current depot version of ANTLR 3.0, i.e. the version must be later than 3.0 beta 5.
grammar Expr;
@header {
import java.util.HashMap;
}
@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog: stat+ ;
stat: expr NEWLINE {System.out.println($expr.value);}
| ID '=' expr NEWLINE
{memory.put($ID.text, new Integer($expr.value));}
| NEWLINE
;
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*
;
multExpr returns [int value]
: e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
;
atom returns [int value]
: INT {$value = Integer.parseInt($INT.text);}
| ID
{
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ) $value = v.intValue();
else System.err.println("undefined variable "+$ID.text);
}
| '(' expr ')' {$value = $expr.value;}
;
ID : ('a'..'z'|'A'..'Z')+ ;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
And here is the main program to execute the parser:
Run ANTLR on the grammar and compile:
Here is a sample run:
$java Test x=1 y=2 3*(x+y) <EOF> 9 $
What if you want to build trees instead? Here is a grammar that builds trees instead of evaluating the expressions:
grammar Expr;
options {
output=AST;
ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}
prog: ( stat {System.out.println($stat.tree.toStringTree());} )+ ;
stat: expr NEWLINE -> expr
| ID '=' expr NEWLINE -> ^('=' ID expr)
| NEWLINE ->
;
expr: multExpr (('+'^|'-'^) multExpr)*
;
multExpr
: atom ('*'^ atom)*
;
atom: INT
| ID
| '('! expr ')'!
;
ID : ('a'..'z'|'A'..'Z')+ ;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
And here is the tree grammar that will read the trees and evaluate the expressions.
tree grammar Eval;
options {
tokenVocab=Expr;
ASTLabelType=CommonTree;
}
@header {
import java.util.HashMap;
}
@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog: stat+ ;
stat: expr
{System.out.println($expr.value);}
| ^('=' ID expr)
{memory.put($ID.text, new Integer($expr.value));}
;
expr returns [int value]
: ^('+' a=expr b=expr) {$value = a+b;}
| ^('-' a=expr b=expr) {$value = a-b;}
| ^('*' a=expr b=expr) {$value = a*b;}
| ID
{
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ) $value = v.intValue();
else System.err.println("undefined variable "+$ID.text);
}
| INT {$value = Integer.parseInt($INT.text);}
;
Here is the main program:
and a sample run:
$ java Test x=1 y=2 3*(x+y) <EOF> (= x 1) (= y 2) (* 3 (+ x y)) 9 $
37 Comments
Hide/Show CommentsNov 26, 2006
Matthias Unverzagt
On generating the EvalTreeParser.java I get
.... value = Integer.parseInt($INT.text);
within the expr() method. Can you tell me what I'm doing wrong.
Thanks, a lot!
Matthias
Nov 29, 2006
Kay Röpke
Both $INT.text and $expr.value need to use labels, like:
e=expr ... {$e.value}
and
i=INT ... {$i.text}
I updated the example accordingly.
Nov 29, 2006
Kay Röpke
The issue with $INT.text is fixed in the depot and will be part of the next beta release.
You still need the label for expr, though.
Nov 29, 2006
Kay Röpke
Disregard my comments. Both issues are fixed in the latest ANTLR depot version, it requires any version later than 3.0 beta 5 to run.
Aug 03, 2007
Karl Stroetmann
I tried the example given above. Everything works as described but I find it confusing that I have to enter <EOF>
before I get the first line of output. After all, that is not what an interpreter is about. I would have expected that every time I enter an expression followed by a newline the corresponding value is printed. I would appreciate any hint how this can be done.
Aug 03, 2007
Terence Parr
Hi. CommonTokenStream buffers all tokens up first...need an unbuffered version. I don't think we have an official one yet...er...maybe we do.
Aug 06, 2007
Joshua Scholar
I wrote a version of the grammar that is smart enough to know that line breaks after operators don't end the line, but I had a problem treating the EOF the same way as a line break, since it's seen as repeating endlessly. Just replacing NEWLINE with NEWLINE|EOF gives you an infinite number of empty statements.
I do have a solution, but it's a cludge... Here is the file, can you see a better way? Note, it also has a few other additions I put in to help me learn more about java and ANTLR. I appreciate any comments about style or ways of implementing.
The way I handled the EOF problem was to create different separators to end lines, one set using (possibly) repeated NEWLINES and another using EOF.
So the grammar starts with:
prog : stat (eb stat)* ef;
where 'eb' is the NEWLINE based separator and ef is the EOF based separator.
Another question I have relates to the canonical comment eater:
bcomment: '#' ( options{greedy=false;}: . )* NEWLINE+ ;
Is there is any problem with breaking out the stop character out of the comment eater? Does it slow the parser down with to do this:
eof_separator: comment ? EOF;
separator: comment ? NEWLINE+;
comment: '#' ( options{greedy=false;}: . )* ;
Josh
Any, here is a slightly more complicated calculator:
...
How do I get that nice box around the code?
@header {
package test;
import java.util.HashMap;
}
@lexer::header {package test;}
@members {
public enum Types { UNDEFINED,INT, DOUBLE}
public class Value {
Types type;
int intval;
double doubleval;
void Set(int i){type=Types.INT; intval=i;}
void Set(double d){type=Types.DOUBLE; doubleval=d;}
void Set(Value v){
if (v.type==Types.INT) Set(v.intval);
else if (v.type==Types.DOUBLE) Set(v.doubleval);
else type=v.type;
}
boolean IsInexact() { return type == Types.DOUBLE; }
Value(int i){Set(i);}
Value(double d){Set(d);}
Value(Value v){Set(v);}
Value(){type=Types.UNDEFINED;}
double GetDouble(){
if (type==Types.DOUBLE) return doubleval;
else if (type==Types.INT) return (double)intval;
System.err.println("dereferenced uninitialized value");
return -1.0;
}
int GetInt(){
if (type==Types.DOUBLE) return (int)doubleval;
else if (type==Types.INT) return intval;
System.err.println("dereferenced uninitialized value");
return -1;
}
void MulSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()*o.GetDouble()); else Set(GetInt()*o.GetInt()); }
void AddSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()+o.GetDouble()); else Set(GetInt()+o.GetInt()); }
void SubSet(Value o){ if (IsInexact() || o.IsInexact()) Set(GetDouble()-o.GetDouble()); else Set(GetInt()-o.GetInt()); }
void Negate() {
if (type==Types.DOUBLE) doubleval=-doubleval;
else if (type==Types.INT) intval=-intval;
}
void Display()
{
if (type==Types.DOUBLE) System.out.println(doubleval);
else if (type==Types.INT) System.out.println(intval);
else System.out.println("undefined type");
}
}
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog : stat (eb stat)* ef;
stat: expr {$expr.value.Display();}
| ID '=' b expr
{memory.put($ID.text, $expr.value);}
|
;
expr returns [Value value]
: v=multExpr {$value = $v.value;}
( '+' b v=multExpr {$value.AddSet($v.value);}
| '-' b v=multExpr {$value.SubSet($v.value);}
)*
;
multExpr returns [Value value]
: v=negExpr {$value = $v.value;} ('*' b v=negExpr {$value.MulSet($v.value);})*
;
negExpr returns [Value value]
: negParity atom {$value = $atom.value; if ($negParity.neg) value.Negate(); }
;
negParity returns [boolean neg]
: { $neg=false;}
| '-' b p=negParity { $neg = !$p.neg;}
| '+' b p=negParity {$neg =$p.neg;}
;
atom returns [Value value]
: INT {$value = new Value(Integer.parseInt($INT.text));}
| DOUBLE {$value = new Value(Double.parseDouble($DOUBLE.text));}
| ID
| '(' v=expr ')' {$value = $v.value;}
;
eb : bnl
| ';' b ;
ef : fnl
| ';' EOF ;
b : bnl ?;
fnl : EOF
| fcomment ;
bnl :NEWLINE+
| bcomment ;
fcomment: '#' ( options {greedy=false;} : . )* EOF ;
bcomment: '#' ( options {greedy=false;} : . )* NEWLINE+ ;
ID : ('a'..'z'|'A'..'Z'|'')('a'..'z'|'A'..'Z'|'0'..'9'|'')* ;
DOUBLE : '0'..'9'* '.' '0'..'9'(('e'|'E')(''|'-')?'0'..'9'+)?
| '0'..'9'('e'|'E')(''|'-')?'0'..'9'+ ;
INT : '0'..'9'+ ;
SKIPNL :( '\\\\' (' '|'\t')* '\r'? '\n') {skip(); };
WS : (' '|'\t')+ {skip();} ;
NEWLINE:'\r'? '\n' ;
SYMBOL : ':' ('a'..'z'|'A'..'Z')+;
Aug 06, 2007
Joshua Scholar
Oops the DOUBLE got messed up, and I can't edit it (all kinds of formatting and escapes seem to be lost on editing).
I'll but escapes in front of every character. That may fix it.
DOUBLE : '0'..'9'
* '.' '0'..'9'+(('e'|'E')('+'|'-')?'0'..'9'+)?
| '0'..'9'+('e'|'E')('+'|'-')?'0'..'9'
+ ;
Jan 01, 2008
Ivan Titov
It is possible to make Eval.g more flexible? If we change order of expression input it raised exception "undefined variable y":
I think it will be useful if such case processing correctly. If not so, what kind of compiler part can help with this?
Jan 01, 2008
Terence Parr
You would need to do lazy evaluation (google keyword) and compute the data dependencies to ensure values needed by computations are done before they are needed.
Dec 30, 2008
Paul
Hi, Im struggling even to get this example working
Im on Ubuntu Linux and running the antlr works 1.2.2 jar from the command line, java 1.6.0_07 and Antlr 3.1.1
I paste the example grammar into the window, then set the options such as 'prog' etc
When I click the black play button, i get an error saying this feature isnt designed forsuch and such (I asked it to stop showing the error, so I dont have the exact message).
In the console window, there is a java error
[22:02:16] java.lang.NullPointerException
at org.antlr.xjlib.foundation.XJUtils.concatPath(Unknown Source)
at org.antlr.xjlib.foundation.XJUtils.concatPath(Unknown Source)
at org.antlr.works.components.editor.ComponentEditorGrammar.getOutputPath(Unknown Source)
at org.antlr.works.components.editor.ComponentEditorGrammar.getANTLRTool(Unknown Source)
at org.antlr.works.grammar.engine.GrammarEngineImpl.getANTLRTool(Unknown Source)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createNewGrammar(Unknown Source)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createParserGrammar(Unknown Source)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createCombinedGrammar(Unknown Source)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createGrammars(Unknown Source)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.analyze(Unknown Source)
at org.antlr.works.grammar.engine.GrammarEngineImpl.analyze(Unknown Source)
at org.antlr.works.interpreter.EditorInterpreter.run(Unknown Source)
at java.lang.Thread.run(Thread.java:619)
Do you have any advice please?
Thanks
Paul
Feb 20, 2009
jd
The NullPointerException problem can arise if ANTLRWorks is unable to find the java compiler (javac.exe on windows).
You can fix this by going to File > Preferences > Compiler and entering the path for the executable.
Feb 20, 2009
tlochner
Hi,
I am still trying to understand how actions and return types are handled within ANTLR3, specifically the C implementation.
Looking at the example above, there are two main things I don't fully understand.
Question 1) How does ANTLR handle variables, if at all? If it doesn't, is this the whole purpose of the hashmap in the above example?
Let me illustrate my question below. Let's say we wish to implement a simple calculator.g lex/parser. The example above might not be exactly the same implementation but let's pretend that it understands the following input syntax.
a=5;
b=7;
c=a+b+3;
result=c*10;
When performing the addition on line 3, how does the ANTLR grammar know that a = 5 and b=7 from several lines up in our input file? Do we need to implement some type of stack or does ANTLR inherently provide this mechanism as part of the parser?
Question 2) How can I return more complex objects using the ANTLR3 C implementation?
In the following parser snippet, we see that the action for the parser rule is able to return a variable of type 'integer'. This variable can then be passed up the parser call stack. What if I wanted to return a C++ object instead of a simple integer type?
Instead of this simple integer return type
multExpr returns [int value]
: e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
;
Let's say we want to implement a grammer rule that can return an object
{ TimeSeries myTS(..); }
addExpr return[TimeSeries retval]
: 'add' '(' expr ',' NUMBER ')' { $retval = myTS.add($NUMBER.text->chars) }
Where I have a C++ Class the defines some like
class TimeSeries
{
public:
TimeSeries(...);
TimeSeries::add(int number);
private:
RWCString name;
std::vector<int> values;;
std::vector<RWDate> dates;;
}
Is this possible? If so, how?
Thanks
-Tom
Feb 24, 2009
David Jameson
I'm new to the Java world and was pleasantly surprised to discover ANTLR which I'm hoping will save me a lot of time. However, I was very discouraged after downloading ANTLRWorks, pasting in the first example (the Expr at the top of this page) and then getting the error message
Cannot display rule "prog" because start state not found
I get this message (with a different rule name) no matter which rule I click on.
(Update --- I quit ANTLRWorks, restarted it and loaded in the Expr.g file again, this time after clicking on a rule, the syntax diagram was displayed fine. So there's some funky initialization stuff that's not being handled properly)
Mar 04, 2009
Vladimir Dyuzhev
The evaluator works with positive numbers only, correct? How would you change the grammar to be able to enter negative numbers too? I'm new to ANTLR, and my streightforward attempt to add optional minus to the INT definition has failed -- an expression like "6-2" is now considered "6" and "-2" tokens, not -(6,2) as I want.
P.S. Found the solution here: http://www.antlr.org/pipermail/antlr-interest/2001-October/000106.html
Mar 14, 2009
Greg Krekelberg
Terence, did you ever confirm whether there exists an unbuffered TokenStream? I'd like to adapt this example to parse the output from an embedded system. The data coming from the embedded side would always be composed of single lines.
(I'm planning to use the C# runtime, so hopefully, if an unbuffered TokenStream exists, it has been ported to the C# runtime.)
Thanks!
Greg
Mar 14, 2009
Terence Parr
Hi! Well, i made it work for tree node streams by created a LookaheadStream in new 3.1.2 ANTLR. I haven't gotten around to making CommonTokenStream nonbuffering by default. It's a matter of subclassing LookaheadStream etc...
Mar 14, 2009
Greg Krekelberg
Good deal. I did a cursory search through the source tar for 3.1.2 and couldn't find a LookaheadStream. Am I looking in the right place? I thought I'd see what I could do implement the changes needed to the C# version of CommonTokenStream.
One thing occurred to me earlier: An intermediate, and possibly bettter implementation would be to simply include a 'flush' method. This lets the client decide when it's "right" to parse the input stream, while leaving most, if not all the other framework intact. This lets everything that may be set up by CommonTokenStream in place, but without having to immediately throw every token at the parser. (I apologize if this is a naive suggestion -- I still need to take a look at the source)
Thanks again!
Greg
Mar 14, 2009
Terence Parr
Ooops. 3.1.3-SNAPSHOT. Try http://www.antlr.org/antlr-snapshot/org/antlr/antlr/3.1.3-SNAPSHOT
Mar 15, 2009
Greg Krekelberg
I tried looking in 3.1.3-SNAPSHOT/antlr-3.1.3-20090315.194241-14-sources.jar, but only found LookaheadSet.
Mar 15, 2009
Terence Parr
Use Hudson then
http://www.antlr.org/hudson/job/ANTLR_Runtime/ws/src/main/java/org/antlr/runtime/misc/LookaheadStream.java
Mar 15, 2009
Greg Krekelberg
That's the ticket! Thanks!
Mar 24, 2009
Greg Krekelberg
Hi Terence,
Well, I was able to successfully create a class called CommonTokenLookaheadStream based on the LookaheadStream and FastQueue. (I used excerpts from the CSharp3 runtime source (3.1.3) compiled against the CSharp2 runtime (using 3.1.2). Evidently, the CSharp3 runtime and templates are bit out sync right now.
Unfortunately, from my survey, it appears that nearly everything in the runtime is written to completely fill buffers at initialization -- something I can't use for my application. I need to monitor an on-going byte stream and pick off recognized elements as they pass by. From my reading of the (Reference) book, the site, and the code, it appears I need a fuzzy- / filter-grammer operating on an infinite stream. Of course the problem is the infinite stream.
I did try writing a class that implements ICharStream using an adaptation of LookaheadStream (that can handle C# value types), but I don't think I have a clear enough understanding of the ICharStream functional requirements. Everything "works" until a token completes. At that point something fails while trying to generate the token text for processing in a production. e.g. I can type an 'INT' (using the console as a character source), but as soon as the parser tries to Match() it, things fail in FastQueue or I get an empty string.
Before I seriously dig into making my ICharStream class work, I thought I'd better ask a couple of questions:
1) Is there a non-buffering character stream already available?
2) Is there any descriptive documentation of the functional requirements imposed on ICharStream. There are definately hints in the doxygen docs, but to me, these seem to more like implementation notes, since I just don't have a good feel for client usage.
Any advice would be greatly appreciated!
Thanks!
Greg
Mar 24, 2009
Terence Parr
No nonbuffering char streams yet...yeah, that sucks. it's the reason I didn't just "fix" it for 3.1.3. CharStream in Java side should explain functionality. sorry for the hassle.
Mar 24, 2009
Greg Krekelberg
Okay. Time to go digging.
Given that no alternative exists, I think making the LookaheadStream (value type variant) and FastQueue are probably a good approach. Of course, if I get something going, I'll be happy to share! I just can't believe I'm the first to want to do this. Antlr is perfect for what I need to do.
I may need some help understanding how FastQueue infers it can bounce items.
[Edit]: Is there documentation about when Antlr uses markers? I'm going to go see if there is anything in the Reference. This seems key to memory management for an infinite stream.
Mar 24, 2009
Terence Parr
I'll help; no problem. People have wanted unbuffered IO, they just cuss my name
Mar 24, 2009
Greg Krekelberg
I just looked at CharStream in Hudson. Sam propagated the comments from CharStream to C#:ICharStream. The comments for method substring() indicate it's not going to be used for infinite streams, but in fact, that seems to be the very place where I am having trouble. I need to dig into Match() to see how this is happening. (Being lazy...) Are the 3.1.2 CSharp2 runtime sources available?
Any thoughts on the use of substring()?
May 24, 2009
Greg Lee
Warning: the version of the expression grammar that appears in "The Definitive ANTLR Reference" has an incorrect WS rule.
Aug 28, 2009
stauros mpa
Hi,i am new to all this and have successfully installed the plug in for ANTLR the ANTLR IDE tool.I create the grammar and the Test.java and when i run it the console expects me the input where i put the above you have in the first case but instead of showing something at the end it insists on waiting input?whats going wrong?can anyone help,i would be gratefull...thanks!
Aug 28, 2009
Terence Parr
Did you hit "end of file" which is either control-D or ctrl-Z depending on your platform?
Aug 29, 2009
stauros mpa
thanks very much for your answer but i have just exceeded my problem with these lines instead:
String input = "Some string";
CharStream cs = new ANTLRStringStream(input);
ExprLexer lexer = new ExprLexer(cs);
control-D or ctrl-Z did nothing and thats was the problem for me..
Feb 09, 2011
Shane MacPhillamy
I would like to convert this example to the C target. What is the best way to handle the memory HashMap with C the target equivalent in the @members section?
Feb 17, 2011
Yegor Bugayenko
Would be nice to have this added to the example. Without this line I failed to compile it.
Feb 17, 2011
Terence Parr
Why package? There's no package mentioned above is there?
Feb 17, 2011
Yegor Bugayenko
Right, my fault. But maybe we should mention it? Otherwise it takes some to find a way to add a header to lexer. Well, at least I had to spend some time to understand that @lexer:header is the way to do it. Maybe we can extend the example and mention some com.acme package everywhere?
Sep 06, 2011
Erel Segal Halevi
I agree with Yegor. I made just a small change to the example - moved it to a subfolder named "grammars" - and it stopped working.
I solved this problem with a simple change to Expr.g:
grammar Expr;
@header {
package grammars;
import java.util.HashMap;
}
@lexer::header {
package grammars;
}
....
Sep 21, 2011
remco
This grammer (Expr.g) gives an error when the input is 1-1<NEWLINE> (no spaces anywhere) The error is MismatchedTokenException(12!=6).
The input 1+1<NEWLINE> works (no errors). I run this grammer in AntlrWorks, but when I run it in debug mode, then the exprtession 1-1<NEWLINE> gives NO error.
(AntlrWorks 1.4.3, Antlr 3.4, StringTemplate 3.2.1, Java 1.6.0_26 (Sun Microsystems Inc.))
Update: Antlr 3.1.1 gives no errors!
Thanks