History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: ANTLR-209
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Terence Parr
Reporter: Terence Parr
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
ANTLR v3

lexer consuming to characters instead of one upon error

Created: 10/Jan/08 11:57 AM   Updated: 10/Jan/08 01:22 PM
Component/s: ANTLR Core
Affects Version/s: 3.0.1
Fix Version/s: 3.1


 Description  « Hide
Curtis Clauson:

grammar SingleCharacter;

@header {
import static java.lang.System.out;
}


/* Parser Rules */
singleCharacter
returns [boolean succeeded = false]
     : Character {
             out.println(
                 "Parsed token Character '" + $Character.text + "'"
             );
         }
         EOF {
             out.println("Parsed token EOF");
         }
         {
             out.println("Parsed singleCharacter");
             $succeeded = true;
         }
     ;

/* Lexer Rules */
Character: 'a';
// Invalid added for problem #2
Invalid : . {$type = Token.INVALID_TOKEN_TYPE;};

There is a serious bug in the lexer that causes it, during error
recovery, to skip two characters instead of just the unexpected
character. When a character is not matched, match() creates an exception
object 'mte', calls recover(mte) which consumes the unexpected
character, and then throws the exception. However, nextToken() catches
that exception, reports the error, and then calls recover(mte) again,
erroneously consuming the token after the already consumed unexpected token.

In my simple example, the source "ba" produces the following results:
Note: The Invalid token did not exist at this point.
<<
Parsing: "ba"
Token stream
     <No tokens>

Parser output
line 1:0 mismatched character 'b' expecting 'a'
BR.recoverFromMismatchedToken
line 0:-1 mismatched input '<EOF>' expecting Character

Returned false
 >>

The lexer does not provide any tokens to the token stream.
This is what actually happens in the lexer:
   1 A call to match('a') is is made by mCharacter()
   2 The first character seen is an invalid 'b'
   2 A MismatchedTokenException object is created
   4 The recover(mte) method is called that consumes the current invalid
character 'b'
   4 The exception object is thrown
   5 The exception is caught by nextToken()
   6 A call to reportError(re) is made that displays the lexer error
   7 Another call to recover(re) is made that consumes the next
character, the valid character 'a'
   8 nextToken() loops back to try to get a token again, sees EOF, and
returns Token.EOF_TOKEN
   9 The parser has no tokens in the stream and reports it saw <EOF>

Given the flow of the code and the use of nextToken(), it seems the
solution is to eliminate the call to recover(re) in the exception
handler of nextToken(). It works fine for my simple example, but I'm not
sure if this is consistent with the intended design of AntLR.

 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Terence Parr - 10/Jan/08 01:22 PM
ANTLR should only recover upon no viable alt as the match() routines recover already.