Welcome to my personal wiki space. Mainly it organizes Terence's blog
ANTLR 1.0 (PCCTS) used a traditional DFA-based lexer. As I moved to ANTLR v2, I realized that we could tokenize input using the same recursive descent mechanism used for parsers. Unfortunately, with only fixed lookahead, ANTLR v2 lexers were sort of difficult to build because he had to do a lot of your own left factoring. ANTLR v3 introduced arbitrary regular look ahead with LL(*), which makes the lexer rules much easier to write. …
I've done a lot of thinking about this and I believe I made a mistake in the final ST 3.2.1 version before my current rebuild (v4). It's too confusing, and makes the code too complex, to distinguish between missing and present but null. There is huge history with ST too suggests that it seems to work okay treating a missing attribute and a null attribute as the same thing (i.e., not there). We have the null option that lets us say what to replace null with. …
Part 1: Null-valued attributes
Let's consider values inside arrays. If names={"Tom", null, null, "Ter"}, what should we get here:
<names>
or here
<names; separator=", ">
My preference would be: TomTer and Tom, Ter. That is what v3 does now. We recently introduced the null option so we can say:
<names; null="foo">
to get "foo" instead of an missing element when names[i] is null.
HOWEVER, you cannot set an attribute to null. So, if instead of passing the list,</names></names;></names;> …
So, let's do some rewriting using the pattern matching filter=true mode. Again, the VecMath.g parser will build trees but we'll avoid building an entire tree grammar. We'll focus on some patterns we want to rewrite.
Here's the grammar to build trees. …
Can't resist showing off new filter mode for tree grammars (this is working in my dev branch). Imagine I built some trees with Cymbal.g and want to define symbols and push/pop scopes. Previously you had to give full tree grammar even though we'd only have actions in a few spots and don't care about structure (we trust tree builder). By doing tree pattern matching, we get to focus only on those subtrees we care about. …
Filter mode for Lexers
So, filter mode for Lexers is incredibly useful. For example, I use it for my wiki to HTML translation. you just specify some rules in the lexer with filter=true and it tries all of the rules against the input stream. Precedence is given to rules specified first. In other words, the lexer tries to match the first rule against the current input location. If it fails, it moves to the next rule in tries it. It tries rules until it finds one that matches or it fails. …
Spent hours and hours on airplanes recently bouncing around Europe giving talks. had some time to think about increasing the recognition strength of LL by increasing the prediction mechanism from a DFA to a pushed down machine, either LL or LR. I've scanned my notes and diagrams; see the attachments on this news item. I tried all sorts of things trying to approximate non-regular lookahead languages but didn't really come up with anything great. One thing of note, …
As Java gets more and more complicated with generics, closures, etc... I keep looking for simplicity. I have the Mantra prototype, but even that subset is kind of complicated. I decided to look at Smalltalk again. Coincidentally, Nik Boyd, who built Bistro years ago contacted me; he's moved to SF Bay area, which means we can chat in person. …
The Default error handling mechanisms that does single token insertion and deletion works really well in many cases. The one area where I think ANTLR need some work is during no viable alternative exceptions and loops around rule references. For example, what happens if you have a fragment like:
d : decl+ ;
decl: 'foo'
| 'bar'
;
If you have input ")) foo bar foo", which is valid except for the crazy "))" at the front, …
Currently syntax errors cause invalid trees and possibly even runtime exceptions when building ASTs. What we really need I believe is to have rules that encounter syntax errors return an ERROR node of some sort that records where the error occurred and, with luck, the tokens consumed during recovery. I started an improvement request:
http://www.antlr.org:8888/browse/ANTLR-193
The basic idea is that ERROR nodes get used in place of ASTs that would normally be produced by rule indications. …
just had some cool ideas about a semantic rule specification language...i should write that up too... (also think about multi-threaded rewriting; no side-effects required
Some language translation problems can be described with a few rewrite rules that are predicated upon their context and potentially an arbitrary Boolean expression. For example, consider a few identity transformations such as
expr: // in context of expr, match left side, …
Currently ANTLR does not create templates for you automatically when you use output=template option. This is because, when I first implemented it, I had no idea what the right answer was here. I did not know how to deal with whitespace and so on. I think I have the answer now. First, let me remind you that output=AST builds a completely flat tree given no instructions to the contrary. Similarly, the template output should reproduce the input given no instructions.
…
Ok, Kay Roepke is in town and we've been discussing the faster expression parsing, among other things. Look for another entry on default StringTemplate generation or a parsing and tree parsing.
Found a ref to Keith Clarke's original recursive-descent precedence work
ANTLR v3.2 will allow special rules for specifying expressions that are particularly efficient both in speed and space. …
I should be working on something else but got to thinking about how annoying it is specifying expressions in recursive descent parsers. You have to have a new rule for each precedence level. This is also very slow. Just to match 34 it has to descend about 15 method calls. I built a prototype single-rule (plus primary and suffix) operator matching thingie which I enclose below. I should be able to generate it from some metameta syntax in antlr. For example, …
