New blog entries
For all the entries, see Terence's blog RSS
In null vs missing vs empty vs nonexistent in ST v4 a few years ago, I tried to resolve in my head the difference between a missing attribute, a null value, an array with no elements, and a string with no characters. I don't think I got it completely thought through and ST v4 might have some weird inconsistencies. This page is an attempt to finally write down all the cases and resolve exactly how things should work.…
Because most ANTLR users don't build compilers, I decided to focus on the other applications for ANTLR v4: parsing and extracting information and then translations. For compilers, we need to convert everything into operations and operands--that means ASTs are easier. For example, 3+4 should become a tree with + at the root and 3 and 4 as operands/children: (+ 3 4). The parse tree in contrast is probably (expr 3 + 4) where rule reference expr is the root node and the other elements are children.…
Had a nice lunch with Mihai Surdeanu today at Stanford. Mihai does natural language processing research and has used ANTLR in the past to process English text. He asked for 2 things: tokens that can be in more than one token class (token type) at the same time and the ability to get all interpretations of ambiguous input. Sam Harwell is also interested in getting all interpretations.…
The ANTLR project is moving to github within a few days. Thanks to user anatol for setting up the ANTLR organization and pulling in the perforce (p4) repositories that we've been using. Everything is now set up for us to seamlessly start using git/github. The purpose of this blog post is to announce this move and to outline how I think workflow should go.
I enjoyed reading A successful Git branching model and I think a lot of it makes sense for my small ANTLR team.…
Update Oct 2012: resolved with correct definition of when to terminate prediction lookahead. Turns out we don't need to push this extra stack element.
Over the Christmas holidays, I've been busy building example grammars for ANTLR v4. The thing I noticed immediately is that grammars just work. There are no error messages from ANTLR when generating code and all we can get are true ambiguity errors at runtime. E.g., if you can recognize T(i) as both a function call and a constructor call.…
ramblings about design as stream of consciousness
I built a quick mockup with NetBeans just so that I have all of the Windows in front of me with a couple of faked images. The basic design is easy because NetBeans allows us to move windows around as we want. I happen to lay the Windows out like this:
(Sam already has the navigator and the editor Windows filled in, but I was too lazy to incorporate.)
Every window has content, publishes events, and listens for events.…
I just finished attending a three day workshop on developing standalone GUI applications with this awesome Java applications framework that you've never heard of. Actually, that's not true. You've heard of it but thought it was an IDE--NetBeans. Unfortunately, the amazing applications framework has been hitched to the NetBeans IDE wagon which, for better or worse, has much less market share than eclipse. (I know how NetBeans users feel because I use Intellij,…
I have a prototype working for the automatic parse tree construction and automatic visitor generation. Imagine we have the following simple grammar:
s : i=ifstat ;
ifstat : 'if' '(' INT ')' ID '=' ID ';' ;
The usual startup code looks like:
TLexer t = new TLexer(new ANTLRFileStream(args));
CommonTokenStream tokens = new CommonTokenStream(t);
TParser p = new TParser(tokens);
p.s(); // invoke the start rule, s
To make it create a parse tree,…
Ok, been doing some thinking and playing around and also talking to Sam Harwell / Oliver Zeigermann.
The first modification I've made is to turn parse tree construction on or off with a simple Boolean, rather than having to regenerate the parser with -debug. Also, the parsers fire methods enterMethod/exitMethod with the rule index all the time now since it is so convenient to have these. No more needing -trace and regenerating to get debug output.…
Summarizing discussion from people on the interest list.
the editor works pretty well to help with auto indenting etc to make things look pretty and provide easy to read formatting.
editor is quirky
forward and backward arrows don't always work
undo is character by character
a number of people pointed out the inefficient and sluggish error checking and syntax highlighting. there are little user benefits for key-stroke-by-keystroke checking while the user is typing,…
After a few weeks away from ANTLR v4 coding, I'm back to thinking about tree grammars and the automated generation of tree visitors. I recently replaced a number of tree grammars in ANTLR v4 itself with much simpler visitor implementations. Doesn't require a separate specification and is much easier to debug. I made an ubervisitor that actually matches patterns in the tree rather than nodes (using a single prototype tree grammar) and then calls listener functions.…
I'm abandoning this post mid-stream...seems that regular alternatives can match erroneous input just as easily as so-called error alternatives. Because of adaptive LL(*), it shouldn't affect production speed at all once it gets warmed up.
ANTLR has a built-in mechanism to detect, report, and recover from syntax errors. It seems to do a pretty good job. Certainly it's better than PEG, which can't detect errors until EOF.…
At long last, I'm back on the ANTLR v4 rebuild after 9 months hiatus to write an academic LL(*) paper with Kathleen Fisher and release StringTemplate v4. Woot!
Ok, so what does all that title nonsense have to do with ANTLR v4? Well, v4 will use all those things at some point, either in analysis or in the generated code. I'm proposing something a little different for v4: Along with a recursive-descent parser,…
After reading more about whitespace handling in scannerless parsing generators (e.g., GLR, PEG), it looks like you have to manually insert references to whitespace rules after every "token rule" and one at the beginning of the parse. So apparently, ANTLR is a scannerless parser generator if you simply use characters as tokens. This page shows not only how to build a real scannerless parser in antlr but also shows how to build abstract syntax trees (i.e., not parse trees)!…
Scannerless parsing generators have an advantage over separate lexers and parsers: it's much easier to create Island grammars, combine components of grammars, and deal with context-sensitive lexical constructs. I still think I prefer tokenizing the input, but thought I would run an experiment to see what a scannerless ANTLR grammar would look like.
I started out with the grammar that contained an LL(*) but non-LL(k) rule (stat). Because we're looking at characters as tokens,…
Links to Terence Notes on Antlr3.
These are the old non-wiki-based entries.
*lexers, parser integration
*tree grammars, parsing
*semantic predicate hoisting
*error reporting, recovery
*ASTs, parse trees, transformation
*Aspects, Actions, Rewriting, Attributes