Can I see a more complete example?

Skip to end of metadata
Go to start of metadata

Sure you can!  This is a real example from Tapestry 5, which includes a simple property expression grammar. In Tapestry 5.0, this is ad-hoc based on regular expressions; in 5.1 it will be ANTLR based and much more extensive.  My first step was to reproduce the ad-hoc parsing using ANTLR.

 The grammar supports a couple of case insensitive keywords (true, false, null and this). It supports string literals in single quotes, integer and decimal literals. It has a range literal (i.e, "1..10") that really gets in the way of parsing decimals.  Identifiers are either property names, or methods (by suffixing with '()') and can be strung together with "." or "?." (the latter is a "safe dereference" that won't try to invoke methods on nulls).

Here's what I've come up with:



The tricky part is NUMBER_OR_RANGEOP. This is a rule but it never emits a NUMBER_OR_RANGEOP token; it uses look ahead to identify INTEGER, DECIMAL, RANGEOP and DEREF tokens.

Let's take it apart:

SIGN? DIGIT+

This starts something that may be an integer, or the start of a decimal. We parse through the digits and the code block in the curly brace executes just after the digits. LA(1) is the "current" character, LA(2) is the character after.  Normally, a decimal point at this location indicates a  DECIMAL, but we turn off that rule entirely if the character after the '.' is also a '.' ... that's two dots in a row, the range operator.  When we think there's a range operator we drop down to the other option and force the token to be an INTEGER, stopping on the last digit.

 SIGN '.' DIGIT+

This is straightforward, another form for a decimal.  We seperate this out because we don't want the next rule to start SIGN? '.', as +.. and -.. are non-sensical.

 '.'

Finally we get to a token that starts with a '.' (and no sign) and maybe a DECIMAL, a RANGEOP or a DEREF. It reads pretty well ... if we can match digits, its an (unsigned) decimal. If we can match a second '.', its a RANGEOP. Otherwise its a '.' followed by something else, so its just a DEREF.

The example also demonstrates a few other ideas; a clumsy way to accomplish case-insensitive identifiers, and the way to handle quoted string literals (by matching the enclosing quotes and then stripping them out in action code).