Introduction
In this project, you will learn to parse DOT
files and also to build trees representing that input. You will build everything by hand, without the use of language tools such as ANTLR, so that you can learn about the infrastructure required to build a language application. The project may seem large at first, but I have provided a framework for you to fill in; there is more thinking than coding. You have to see the pattern, but once you get it, you can solve the problem easily.
Let's begin with a series of small DOT files and the resulting visual representation as built by DOT/Graphviz.
Tasks
You have three subprojects you need to build:
- a scanner or lexical analyzer for DOT that breaks up the input stream of characters into vocabulary symbols called tokens: DOTScanner.java
- a DOT parser that recognizes the grammatical structure, producing errors upon syntactically invalid input: DOTParser.java
- an extension of the DOT parser that builds an abstract syntax tree (AST) and returns it to the calling program: DOTTree.java.
In the attachments to this page, you'll find AST.java
, TokenStream.java
, and Token.java
support code as well as the framework
for each of the above mentioned files.
To test each of the tasks, notice that there is a main() in each of them so you can simply run Java on that class. For example here is the meat from DOTScanner's main:
DOTScanner scanner = new DOTScanner(new InputStreamReader(System.in));
Token t = scanner.nextToken();
while ( t.type!=Token.EOF_TYPE ) {
System.out.println(t);
t = scanner.nextToken();
}
When run on a.dot above, you should get the following output:
Running the DOTParser should simply print OK if there are were no syntax errors:
On the other hand, if you type in some erroneous input, you should see an exception:
Note: CTRL-D sends the end of file signal to the standard input; on a PC this is CTRL-Z.
Your the final task involves building ASTs for the input files as shown in the section below "Building Abstract Syntax Trees":
As you modify the framework I provide, you should spend time thinking about how it is constructed. The code represents a quintessential recursive descent parser and handbuilt lexer. The code is very clean and easy to understand. From this, you should try to identify the relationship between the grammar provided in this assignment and code the you build. For example, you will note that for every rule in the grammar there is a method in the parser.
Simplified DOT syntax
| digraph |
 |
| cluster |
 |
| subgraph |
 |
| property |
 |
| edge |
 |
| nodelabel |
 |
| value |
 |
| optionlist |
 |
Simplified DOT grammar
Parse tree
This section shows the relationship between an input file and how the DOT grammar interprets it. The interior nodes of a parse tree are rule names and the leaves are input symbols.
Building Abstract Syntax Trees
Building ASTs is a matter of having each rule return an appropriate structure for the tokens matched and the rules referenced within. The AST returned from the starting role is the completed history. (Ignore the nil nodes on the ASTs displayed below).
| AST |
toStringTree() |
 |
thumbnail! |
 |
|
 |
|
 |
|
There are 3 basic operations you will need when building a tree:
- creating a root node from a token (the token is considered the payload)
AST root = new AST(tokens[p]);
- adding a child created from the token
root.addChild(tokens[p]); match(DOTScanner.ID);
- adding a subtree return from a rule reference
AST t = value();
root.addChild(t);
Then, each method in your parser returns root.
The only unusual thing is the use of two imaginary tokens. An imaginary token is one that does not correspond to any real input symbol. In this case I create two imaginary tokens, CLUSTER and PROPS, to act as roots for a collection of nodes. CLUSTER is a better node name than '{', so I use that. I use the PROPS imaginary token as a root to hold the list of properties matched as part of an option list. Imaginary nodes are created like this:
AST root = new AST(new Token(PROPS,"PROPS"));
There are samples in the framework I provide.
Submission
You will create a jar file called dot.jar containing source and *.class files and place in your lib directory:
You can use the svn account for development of the software to if you would like, but I will only be looking at your dot.jar file in the lib directory.
For more information, see svn in CS601
. Naturally you will have to substitute cs652 for cs601.
Grading
You will be graded quickly in class interactively. We will move the class to the CS labs before the end of class in order to do the grading. Generally the grading will occur on the day the project is due.