4.0
- 3.5 release
- api doc for 4.0
LL(1) Optimization
It's not correct to avoid full LL for k=1 SLL conflict; we found a counterexample. something to do with the fact that, while the class of LL(1)=SLL(1), that doesn't mean that the parser decisions are equally powerful. Anyway, Sam reports that it's a big optimization to put this back in. Then, if we get a syntax error, we fail over to full LL. That leaves us with 3 stages SLL -> LL+k=1 -> LL.
I said "LL(1) == SLL(1)", Sam says "this results in a faulty reportAmbiguity on the else in "{ if a then foo else bar }", when in fact it's only a conflict for "{ if a then if a then foo else bar }"
Semantics and error checking
- warn about non-unique refs to elements from headers like $ID
- warn if $e used for rule ref in rule e. e : ... | '(' e ')' {print $e.v;} // doesn't translate to rule ref
parse trees
- xpath or jquery like feature to do find nodes
- tree pattern matching? e.g., find all (e 1 1) trees.
- Create method to create new parse tree with concrete syntax.
return parse("stat", "while (i>3) {...}"); - Find a way to have some nodes not appear in parse tree or at least in listener. just skip creating new _localctx.
- (I believe this is corrected now)
Take this example:e : ID | e '.' ID;With the input "a.a.a" you get (select (select a a) a) the .stop value for the outer ctx is correct, but for the inner (select a a) it's null correction, i'm not sure it's correct for the outer due to a syntax error occuring in mine, but it's definitely null for the innere :ID | e '.' ID;With the input "a.a.a" you get (select (select a a) a) the .stop value for the outer ctx is correct, but for the inner (select a a) it's null correction, i'm not sure it's correct for the outer due to a syntax error occuring in mine, but it's definitely null for the inner. yep, _localctx.stop is set at the very end of the rule, but it needs to be inside the postfix expr loop. right after "_prevctx = _localctx;" add "_prevctx.stop = _input.LT(-1);" Should be last real token; not conjured. - need to know if an error occurred in rule.
syntax
- ID*[','] comma-separated list of ID
- &foo syn preds like pegs; 0 width. where are they allowed?
Code generation
- move all ctx objects to bottom?
- be able to split big grammar into chunks using inheritance
- split serialized atn
analysis
- Sam's LL(1) optimization
- Sam's optimized ATN transition thingie. optimize during deserialization.
tail call optimization. x : a b; closure can jump to b not pushing frame since if we ever fall out of entry rule, x, it'll compute FOLLOW in SLL mode. Slightly weaker since we might do FOLLOW when we could have specific call stack. but drops mem like 50%. this grammar would try to compute FOLLOW at end of x (stack is s calls a) since we didn't push calls site in a.
s calls a then a pretends to make decision but doesn't push call. at end of x let's say we look further but have no stack in SLL mode so we do FOLLOW. It'll see ID in there but in real SLL, we'd see call site in a on stack.
Sam now reports that you can't skip the push when calling a rule from the decision entry rule in the lexer since we stop at stop state when stack empty in match(). This gives the wrong token type. Works to resolve weakness in SLL if we push tail call frame from dec entry rule since it doesn't fall back to FOLLOW too early.tail call elimination has a big impact.
- DFAs mostly have one edge. optimize to avoid array for this case
testNotSetRuleRootInLoop. ~set in LL1Analyzer doesn't compute ~
- big expressions still are slow due to full LL
turn on predctx cache for lexer
Errors
- add sync()-like functionality to prediction so that, even during prediction, we can do single token insertion or deletion.
Visitors/ event listeners
Options
Runtime
lexers
- [\]] should be just ] on inside not \\ and ]
- tokenVocab '++'=33 imported to lexer doesn't define a literal; should it? nah
- don't allow actions in fragment rules; they don't exec
- don't allow labels or parameters or return values on lexer elements
- should we allow same token name in multiple modes? seems useful.
- Using the first ANY_GENERAL rule, it consumes everything. Swapping for the second ANY_GENERAL rule, it works as intended.
I don't understand why they are not doing the same thing?
- import mode pulls rules into another mode; shares common stuff like WS, ID, etc...
Misc
- @ANTLR(...) to compile grammars in package
- @api to signify stuff to use from antlr api vs public; Sam points out that this really should be a Java interface.
- Consider this example:
In this example $expr should bind to the sub-expression in my opinion.
However, it does not. Since the rule is also named expr, $expr refers to
the rule context instead of the context of the sub-expression. I think
most of the time this is not what the user wants.
2 Comments
Hide/Show CommentsDec 01, 2011
Ruben Laguna
Can I ask why the "no ast output from tree grammar" bullet? Is there something fundamentally wrong with generating ast from tree grammars?
Dec 11, 2011
Terence Parr
Hi Ruben, turns out I'm de-emphasizing tree grammars and it's a rarely used feature and hard to implement.
Ter