...
Token names always start with a capital letter and so do lexer rules as defined by Java’s Character.isUpperCase method. Parser rule names always start with a lowercase letter (those that fail Character.isUpperCase). The initial character can be followed by uppercase and lowercase letters, digits, and underscores. Here are some sample names:
...
| Code Block |
|---|
ID, LPAREN, RIGHT_CURLY // token names/rules |
...
expr, simpleDeclarator, d2, header_file // rule names |
Like Java, ANTLR accepts Unicode characters in ANTLR names:
...
To support Unicode parser and lexer rule names, ANTLR uses the following rule:
...
| Code Block |
|---|
ID : a=NameStartChar NameChar* |
...
{ if ( Character.isUpperCase(getText().charAt(0)) ) setType(TOKEN_REF); |
...
else setType(RULE_REF); |
...
} ; |
NameChar identifies the valid identifier characters:
...
| Code Block |
|---|
fragment |
...
NameChar : NameStartChar | '0'..'9' |
...
| '_' |
...
| '\u00B7' |
...
| '\u0300'..'\u036F' |
...
| '\u203F'..'\u2040' |
...
NameStartChar is the list of characters that can start an identifier (rule, token, or label name):
...
; fragment NameStartChar : 'A'..'Z' | 'a'..'z' |
...
| '\u00C0'..'\u00D6' |
...
| '\u00D8'..'\u00F6' |
...
| '\u00F8'..'\u02FF' |
...
| '\u0370'..'\u037D' |
...
| '\u037F'..'\u1FFF' |
...
| '\u200C'..'\u200D' |
...
| '\u2070'..'\u218F' |
...
| '\u2C00'..'\u2FEF' |
...
| '\u3001'..'\uD7FF' |
...
| '\uF900'..'\uFDCF' |
...
| '\uFDF0'..'\uFFFD' |
...
; |
NameStartChar is the list of characters that can start an identifier (rule, token, or label name):
These more or less correspond to isJavaIdentifierPart and isJavaIdentifierStart in Java’s Character class. Make sure to use the -encoding option on the ANTLR tool if your grammar file is not in UTF-8 format, so that ANTLR reads characters properly.
...