
|
If you were logged in you would be able to see more operations.
|
|
|
ANTLR v3
Created: 10/Jan/08 11:57 AM
Updated: 10/Jan/08 01:22 PM
|
|
| Component/s: |
ANTLR Core
|
| Affects Version/s: |
3.0.1
|
| Fix Version/s: |
3.1
|
|
|
Curtis Clauson:
grammar SingleCharacter;
@header {
import static java.lang.System.out;
}
/* Parser Rules */
singleCharacter
returns [boolean succeeded = false]
: Character {
out.println(
"Parsed token Character '" + $Character.text + "'"
);
}
EOF {
out.println("Parsed token EOF");
}
{
out.println("Parsed singleCharacter");
$succeeded = true;
}
;
/* Lexer Rules */
Character: 'a';
// Invalid added for problem #2
Invalid : . {$type = Token.INVALID_TOKEN_TYPE;};
There is a serious bug in the lexer that causes it, during error
recovery, to skip two characters instead of just the unexpected
character. When a character is not matched, match() creates an exception
object 'mte', calls recover(mte) which consumes the unexpected
character, and then throws the exception. However, nextToken() catches
that exception, reports the error, and then calls recover(mte) again,
erroneously consuming the token after the already consumed unexpected token.
In my simple example, the source "ba" produces the following results:
Note: The Invalid token did not exist at this point.
<<
Parsing: "ba"
Token stream
<No tokens>
Parser output
line 1:0 mismatched character 'b' expecting 'a'
BR.recoverFromMismatchedToken
line 0:-1 mismatched input '<EOF>' expecting Character
Returned false
>>
The lexer does not provide any tokens to the token stream.
This is what actually happens in the lexer:
1 A call to match('a') is is made by mCharacter()
2 The first character seen is an invalid 'b'
2 A MismatchedTokenException object is created
4 The recover(mte) method is called that consumes the current invalid
character 'b'
4 The exception object is thrown
5 The exception is caught by nextToken()
6 A call to reportError(re) is made that displays the lexer error
7 Another call to recover(re) is made that consumes the next
character, the valid character 'a'
8 nextToken() loops back to try to get a token again, sees EOF, and
returns Token.EOF_TOKEN
9 The parser has no tokens in the stream and reports it saw <EOF>
Given the flow of the code and the use of nextToken(), it seems the
solution is to eliminate the call to recover(re) in the exception
handler of nextToken(). It works fine for my simple example, but I'm not
sure if this is consistent with the intended design of AntLR.
|
|
Description
|
Curtis Clauson:
grammar SingleCharacter;
@header {
import static java.lang.System.out;
}
/* Parser Rules */
singleCharacter
returns [boolean succeeded = false]
: Character {
out.println(
"Parsed token Character '" + $Character.text + "'"
);
}
EOF {
out.println("Parsed token EOF");
}
{
out.println("Parsed singleCharacter");
$succeeded = true;
}
;
/* Lexer Rules */
Character: 'a';
// Invalid added for problem #2
Invalid : . {$type = Token.INVALID_TOKEN_TYPE;};
There is a serious bug in the lexer that causes it, during error
recovery, to skip two characters instead of just the unexpected
character. When a character is not matched, match() creates an exception
object 'mte', calls recover(mte) which consumes the unexpected
character, and then throws the exception. However, nextToken() catches
that exception, reports the error, and then calls recover(mte) again,
erroneously consuming the token after the already consumed unexpected token.
In my simple example, the source "ba" produces the following results:
Note: The Invalid token did not exist at this point.
<<
Parsing: "ba"
Token stream
<No tokens>
Parser output
line 1:0 mismatched character 'b' expecting 'a'
BR.recoverFromMismatchedToken
line 0:-1 mismatched input '<EOF>' expecting Character
Returned false
>>
The lexer does not provide any tokens to the token stream.
This is what actually happens in the lexer:
1 A call to match('a') is is made by mCharacter()
2 The first character seen is an invalid 'b'
2 A MismatchedTokenException object is created
4 The recover(mte) method is called that consumes the current invalid
character 'b'
4 The exception object is thrown
5 The exception is caught by nextToken()
6 A call to reportError(re) is made that displays the lexer error
7 Another call to recover(re) is made that consumes the next
character, the valid character 'a'
8 nextToken() loops back to try to get a token again, sees EOF, and
returns Token.EOF_TOKEN
9 The parser has no tokens in the stream and reports it saw <EOF>
Given the flow of the code and the use of nextToken(), it seems the
solution is to eliminate the call to recover(re) in the exception
handler of nextToken(). It works fine for my simple example, but I'm not
sure if this is consistent with the intended design of AntLR. |
Show » |
|