Grammar Lexicon

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Token names always start with a capital letter and so do lexer rules as defined by Java’s Character.isUpperCase method. Parser rule names always start with a lowercase letter (those that fail Character.isUpperCase). The initial character can be followed by uppercase and lowercase letters, digits, and underscores. Here are some sample names:

...

Code Block
ID, LPAREN, RIGHT_CURLY // token names/rules

...


expr, simpleDeclarator, d2, header_file // rule names


Like Java, ANTLR accepts Unicode characters in ANTLR names:

...

To support Unicode parser and lexer rule names, ANTLR uses the following rule:

...

Code Block
ID : a=NameStartChar NameChar*

...


     {   
     if ( Character.isUpperCase(getText().charAt(0)) ) setType(TOKEN_REF);

...


     else setType(RULE_REF);

...


     }   
   ;

NameChar identifies the valid identifier characters:

...

Code Block
fragment

...


NameChar
   : NameStartChar
   | '0'..'9'

...


   | '_'

...

 
   | '\u00B7'

...


   | '\u0300'..'\u036F'

...


   | '\u203F'..'\u2040'

...

NameStartChar is the list of characters that can start an identifier (rule, token, or label name):

...


   ;
fragment
NameStartChar
   : 'A'..'Z' | 'a'..'z'

...


   | '\u00C0'..'\u00D6'

...


   | '\u00D8'..'\u00F6'

...


   | '\u00F8'..'\u02FF'

...


   | '\u0370'..'\u037D'

...


   | '\u037F'..'\u1FFF'

...


   | '\u200C'..'\u200D'

...


   | '\u2070'..'\u218F'

...


   | '\u2C00'..'\u2FEF'

...


   | '\u3001'..'\uD7FF'

...


   | '\uF900'..'\uFDCF'

...


   | '\uFDF0'..'\uFFFD'

...


   ;

NameStartChar is the list of characters that can start an identifier (rule, token, or label name):

These more or less correspond to isJavaIdentifierPart and isJavaIdentifierStart in Java’s Character class. Make sure to use the -encoding option on the ANTLR tool if your grammar file is not in UTF-8 format, so that ANTLR reads characters properly.

...