|
Home |
Download |
ANTLRWorks |
Wiki |
About ANTLR |
Feedback |
Support |
Bugs |
v2
|
|
|
Latest version is 3.2 Download now! » |
|
|
Grammar Reuse
May 15, 2007I'm starting to think about how to support reusing grammars. ANTLR v2 had an inheritance model, but it had to fundamental flaws:
Both of these made it very difficult to debug grammars. In many cases the generated parsers were too big to debug. For example, one user told me that there generated Oracle 10g parser was 129k lines! The primary advantages of reusing grammars are the same advantages obtained from reusing code. You get single point of change and you can describe a new problem only as it differs from the previous solution. Using the inheritance model of ANTLR v2, one company gave me the following information:
There is a great deal of rule sharing as you can see. Inheritance works great when you are adding new constructs. You just can either add to or override rules from the supergrammar. The gets much harder when you try to alter alternatives within a rule or even the middle of an alternative. Depending on how you structure your super grammar initially, you may have to modify the super grammar extensively. Doing this, risks breaking a lot of things in a complicated grammar. Sometimes it is better to cut and paste rather than munge supergrammar. Tree grammar versus parser grammar reuse. The company only used inheritance for the parser, not the tree parser. They had one tree parser per vendor. All versions of the SQL were normalized by the parser into a single tree structure per vendor, hence, one tree grammar per vendor. Actions. The company did not have many actions in the parser, preferring to simply build a tree. All of the actions are in the tree parsers. Fixing inheritance model. One of the most obvious solutions is
simply to make grammar inheritance rely on class inheritance so that
some grammars result in generated code commensurate with their size.
This is in contrast to the previous solution that treated inheritance
like a glorified include. Another obvious thing to add is the ability
to add an alternative to an existing rule without having to cut and
paste the entire rule:
or, perhaps:
Sometimesyou need to replace a single alternative from the
superclass. Rather than altering the superclass to reference a rule
so that you can override that rule, perhaps naming the alternatives is
a good idea like Rats! does.
Then a some grammar can override that alternative:
Hmm...that ':' as the inheritance operator is bad because it overloads
the rule definition operator. Anyway, and you could also remove
alternatives
Delegation model. Perhaps a better and more general solution is to use delegation. This would work like multiple inheritance really. Any rule not defined in the current grammar would be looked up in the delegates. The delegation model is nice because it allows you to break up a big
SQL supergrammar into chunks so you can share the common stuff among
more vendors and override what you need. This is more like the grammar
composition from tools like Rats!. For example, in SQL there are
logical chunks. You could delegate on expr, predicates, conditions,
data manipulation language statements, data definition language
statements, etc.
Hmm...I like the word delegate better than import, but import
makes a little more sense. The import statement makes all of the rules
within the specified grammar visible to the rules within the current
grammar. Looking up symbols would be such that ambiguous rule
references would be illegal. You would need to fully qualify them.
Note that dynamic binding would ensure that things work as you expect:
Here, if you call rule 'a' in grammar B, it will grab the definition from A which we'll call rule 'b'. Because you invoked rule 'a' from a B grammar, rule 'a' will invoke B.b not A.b. Lexer grammar reuse. lexers can be shared 90% or more. Macros. What about having macros for expressions and so on? Nah, seems and unnecessary complication at this point. Misc. Having a nice way to make changes other than at the course rule level granularity, would be great. Apparently the changes between Oracle SQL 8, 9, 10 are not additions. They are changes, about 15% between versions. We need to be careful that we do not mark up the subgrammars so much that they are impossible to read. |
|||||||||||||||||||||||||||||||||||||||||||||||