I've been pretty sick this last week so I decided to work on something that was fun. After talking with Don Bradshaw at http://www.quipoz.com in Sydney and after trying to convert ANTLR v3's grammar (written in v2) into v3, I decided to build TreeWizard. ANTLR has a nice facility for creating trees from input symbols like:
But, sometimes outside of the grammar file (or in severe cases in a grammar action) you need to build a tree in raw code. Further, there are many cases, as Andy Tripp will tell you, that building a tree grammar is a overkill just to examine a few subtrees. You might want to simply print out all of the declarations defined in the tree. You don't care about the tree structure except for the declaration itself. A grammar for the full tree is a lot of unnecessary work.
TreeWizard is my first attempt at a utility class that lets you create and navigate trees without relying on an ANTLR grammar. You will find it attached with about 30 unit tests. This is only my development branch so we are free to update, change, and otherwise mangle the definition. I just want to make sure that we have something out there that people can try out to make sure I'm going in the right direction. See the unit tests for more examples than I give here. Also note that I think I changed CommonTree's getText to not call toString(), but call getText():
Creating a tree is a simple matter of supplying a pattern for the tree using the token names from an ANTLR grammar:
where the tokens from your parser look something like:
Once you create a TreeWizard, you can keep calling create as it will know what kind of nodes to make, how to hook them up, and what the token types are for the token names. The wizard really only has write-once data in it...just a convenience so you don't have to keep passing stuff to each of the methods.
Oh, if you want to set the text of a node, use the argument syntax with no quotes:
If you want to compare two trees, you'd have to write your own method to do so right now. The equals() method in Java is not necessarily the right way to implement either because that is probably just matching to nodes not trees. The TreeWizard provides a method to check the equality two trees (one static and one instance method):
If you would only like to match the structure of a tree, but not the text associated with each node (so that, say, any ID node will match), use the parse method:
Wildcards work too:
Sometimes you want some of the nodes to match the text exactly, but most of the time match just the structure. You can do that too:
Getting pointers to subtree nodes
Matching trees is great, but most of the time you want to get a pointer to some of the children not just determine if the tree structure matches. To accomplish this, label the elements in the tree pattern and pass in a hash table that the parse() method can fill in:
labels.get("a") points to the root etc... Cool, eh?
Sometimes you want a list of all nodes with a certain type like VARDECL or METHOD:
Sometimes you want all lists for each token type:
In order to visit all nodes a particular type for all subtree that match a particular pattern, use the visit() method. For each node or subtree that matches, TreeWizard invokes the method and the following interface:
Notice that it passes you a lot of interesting contact information including "who's your daddy?" and "what child am I?" The labels printer is not set unless you're matching against a tree pattern (i.e., won't work for matching against a token type).
Here is some code that visits every B node and puts them into a list, which is of course how I implemented the find() that returns a list.
where Visitor is just a convenience class that makes it easier for you to implement if all you want to do is visit without context information:
Now, let's say that you want to match a bunch of subtrees for declarations and pick out the identifier in each. No problem: