<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Order doesn't matter. ANTLR will match the longest possible token.<br>
<br>
One case when order matters is when the rule below cannot match any
token inspite of the 'lengthiest token matching' mechanism.<br>
<br>
Example:<br>
<br>
ID : 'a'..''z'+ ;<br>
<br>
SOME_KEYWORD : 'key' ;<br>
<br>
In this case ANTLR will report an error as SOME_KEYWORD can never be
matched. Disambiguating by 'lengthiest token' will not work here.<br>
<br>
Cheers, Indhu<br>
<br>
Avid Trober wrote:
<blockquote cite="mid:CE6483E6340C4ABCB3A38B6622AB731D@homelaptop"
type="cite">
<pre wrap="">thanks.
org.antlr.Tool is happy with these two, regardless of which one is
above/below the other.
But, won't the DFA's care about the order???
DQUOTE : '"' ;
DQUOTE_STRING : DQUOTE ( ~('"') )* DQUOTE
----- Original Message -----
From: "Gavin Lambert" <a class="moz-txt-link-rfc2396E" href="mailto:antlr@mirality.co.nz"><antlr@mirality.co.nz></a>
To: "Avid Trober" <a class="moz-txt-link-rfc2396E" href="mailto:avidtrober@gmail.com"><avidtrober@gmail.com></a>; <a class="moz-txt-link-rfc2396E" href="mailto:antlr-interest@antlr.org"><antlr-interest@antlr.org></a>
Sent: Tuesday, April 21, 2009 6:53 AM
Subject: Re: [antlr-interest] Lexing 7-bit ASCII stream
</pre>
<blockquote type="cite">
<pre wrap="">At 21:59 21/04/2009, Avid Trober wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I'm parsing a 7-bit ASCII stream ... 2 questions
Question 1: can't I just fall-thru wrt to lexer rules, where lexer rules
are specific-to-general, and avoid indeterminisms at run-time?
</pre>
</blockquote>
<pre wrap="">[...]
</pre>
<blockquote type="cite">
<pre wrap="">... // (AND IF NOTHING ABOVE MATCHES, AT LEAST WE'RE MATCHING HERE ... )
CHAR : '\u0000'..'\u007F' // any 7-bit US-ASCII character
;
</pre>
</blockquote>
<pre wrap="">You can specify a catch-all match like so:
CHAR : .;
If this is the last lexer rule, then it will behave as you're expecting.
</pre>
<blockquote type="cite">
<pre wrap="">Question 2: I'm at a loss how to match the notation in the spec I'm
writing a grammar for where binary digits are '0' or '1' and digits are
'0'..'9'. (ABNF-ish) It is prefered to make the grammar rule names match
that (whether lexer or parser, it doesn't matter)
</pre>
</blockquote>
<pre wrap="">Generally, it's best to have the lexer match as wide as possible (ie. have
DIGIT, not BINARY_DIGIT) and sort it out in the parser, where you can use
the context to give better error messages if you encounter something
invalid.
</pre>
<blockquote type="cite">
<pre wrap="">Can I write a binary_digit parser rule that works with DIGIT above
somehow?
</pre>
</blockquote>
<pre wrap="">Yep. Depending on the context, you may want to either use a
lookahead-based entry predicate to avoid entering the rule if the DIGITs
aren't binary-safe, or a exit predicate that raises an error if it turns
out that the sequence wasn't valid binary.
</pre>
</blockquote>
<pre wrap=""><!---->
List: <a class="moz-txt-link-freetext" href="http://www.antlr.org/mailman/listinfo/antlr-interest">http://www.antlr.org/mailman/listinfo/antlr-interest</a>
Unsubscribe: <a class="moz-txt-link-freetext" href="http://www.antlr.org/mailman/options/antlr-interest/your-email-address">http://www.antlr.org/mailman/options/antlr-interest/your-email-address</a>
</pre>
</blockquote>
<br>
</body>
</html>