Here is a rule to match floats:
Now if you want to add '..' range operator so 1..10 makes sense, ANTLR has trouble distinguishing 1. (start of the range) from 1. the float without backtracking. So, match '1..' in NUM_FLOAT and just emit two non-float tokens:
By default Lexer objects only emit 1 token at once. Make a buffer by overriding a few methods:
Labels:
8 Comments
Hide/Show CommentsJan 10, 2008
Eric Lindahl
by
I am assuming he means
How this interacts with the lookahead is somewhat confusing. It would seem like semantic predicates are needed work correctly with this.
Jan 10, 2008
Terence Parr
Ooops. Sorry. v3.0.1 it's this.token. For v3.1, it's state.token.
Jan 10, 2008
Eric Lindahl
My particular problem is that it inputs may not always be separated by whitespace. E.g.
Should someone be using v3.1 for dealing with this
Jan 10, 2008
Terence Parr
Just add ' '* in the ID rule, which should work.
Jan 10, 2008
Eric Lindahl
Once I added
ID : (IDL|IDA) ' '*;
I had to start adding ' '* to all my other tokens. E.g.
VERSION : 'version' ' '* ;
Which isn't ideal. I think I'll go back to emitting the extra tokens.
BTW, this Confluence editor has problems preserving formats b/w edits.
Nov 13, 2008
Matthias Troffaes
Here is how I managed to emit multiple tokens per lexer rule for the python target (ANTLR 3.1):
The non-trivial difference from the Java target is that the emit member must call Lexer.emit to cover the token=None case as well (I found it instructive to check the Lexer base class in the runtime, recognizers.py).
Given the title of this article, most people clicking here are probably only interested in the implementation. Of course, it is also good to go the extra mile and to have examples demonstrating cases where emitting multiple tokens is useful, for those who are interested in more than just a quick implementation. Can I therefore simply suggest to move the Java code, which effectively answers this FAQ, to the first paragraph of this wiki page (perhaps along with implementations for other targets)?
Jul 21, 2009
Andrew Bradnan
The C# translation is as follows.
Nov 04, 2012
Peter S. May
Java (inline)
For Java 1.6, the following is a bit more idiomatic (backport of the previous C# snippet), with the inclusion of generics and annotations:
Java (separate class)
For any number of reasons, one might prefer not to have more target platform code inline than necessary.
It's possible to leave the multi-emit overrides out of the grammar itself by squirreling them away into an abstract base class, which must be named
Lexerbut may appear in an arbitrary package. (This is possible because the generated lexer classextends Lexerwithout specifying its package.) Import the new class in the lexer header:Then, using that name, supply a functioning subclass of the original
Lexerwith at least the nullary and two-argument constructors passed through.Scala via Java target
If you're using the Java target with Scala, it's possible to put these overrides in a Scala class instead. (The same changes to
@lexer::headerapply.)