Grammar syntax
There are four kinds of grammars: lexer, parser, tree, and combined (no modifier), but all grammars are of the form:
/** This is a grammar doc comment */
grammar-type grammar name;
options { name1 = value; name2 = value2; ... }
import delegateName1=grammar1, ..., delegateNameN=grammarN; // can omit delegateName
tokens { token-name1; token-name2 = value; ... }
scope global-scope-name-1 { «attribute-definitions» }
scope global-scope-name-2 { «attribute-definitions» }
...
@header {...}
@lexer::header {...}
@members {...}
«rules»
To set the superclass of the generated class, use the superClass option.
Rule syntax
/** rule comment */
access-modifier rule-name[«arguments»] returns [«return-values»] throws name1, name2, ...
options {...}
scope {...}
scope global-scope-name, ..., global-scope-nameN;
@init {...}
@after {...}
: «alternative-1» -> «rewrite-rule-1»
| «alternative-2» -> «rewrite-rule-2»
...
| «alternative-n» -> «rewrite-rule-n»
;
catch [«exception-arg-1»] {...}
catch [«exception-arg-2»] {...}
finally {...}
Lexer rules
Rules in a lexical grammar are token names:
Here are some common lexical rules for programming languages:
The $channel=HIDDEN; action places those tokens on a hidden channel. They are still sent to the parser, but the parser does not see them. Actions, however, can ask for the hidden channel tokens. If you want to literally throw out tokens that use action skip();.
Sometimes you will need some help or rules to make your lexer grammar more readable. Use the fragment modifier in front of the rule:
In this case, HexDigit is not a token in its own right; it can only be called from HexLiteral.
Tree grammar rules
Rules in tree grammars are identical to parser grammars except that they can specify a tree element to match. The syntax is:
^( root child1 child2 ... childn )
Attribute scope syntax
Attribute scopes are a set of attribute definitions of the form:
scope name {
type1 attribute-name1;
type2 attribute-name2;
}
Grammar action syntax
Actions are, in general, of the form:
@action-name { ... }
@scope-name::action-name { ... }
For example, @header {...} is the same as @parser::header {...}. The action scope names differ depending on the target, but targets should support @parser and @lexer. Another common action name is members, which inserts that action into the generated class definition.
Rule elements
Rules may reference:
| Element |
Description |
| T |
Token reference. An uppercase identifier; lexer grammars may use optional arguments for fragment token rules. |
| T<node=V> or T<V> |
Token reference with the optional token option node to indicate tree construction note type; can be followed by arguments on right hand side of -> rewrite rule |
| T[«args»] |
Lexer rule (token rule) reference. Lexer grammars may use optional arguments for fragment token rules. |
| r [«args»] |
Rule reference. A lowercase identifier with optional arguments. |
| '«one-or-more-char»' |
String or char literal in single quotes. In parser, a token reference; in lexer, match that string. |
| {«action»} |
An action written in target language. Executed right after previous element and right before next element. |
| {«action»}? |
Semantic predicate. |
| {«action»}?=> |
Gated semantic predicate. |
| («subrule»)=> |
Syntactic predicate. |
| («x»|«y»|«z») |
Subrule. Like a call to a rule with no name. |
| («x»|«y»|«z»)? |
Optional subrule |
| («x»|«y»|«z»)* |
Zero-or-more subrule |
| («x»|«y»|«z»)+ |
One-or-more subrule |
| «x»? |
Optional element. |
| «x»* |
Zero-or-more element. |
| «x»+ |
One-or-more element. |