Introduction

In this project, you will learn to parse DOT files. You will build everything by hand, without the use of language tools such as ANTLR, so that you can learn about the infrastructure required to build a language application. You are free to use snippets of code from LIP patterns 2, 3, and 4. Your goal is to learn how to apply lexer/parser patterns to the problem at hand.

Let's begin with a series of small DOT files and the resulting visual representation as built by DOT/Graphviz.

digraph tree {
0 -> "1" [label = "A"];
0 -> "2" [label = "B"];
}

digraph tree {
rankdir=LR;
0 -> "1" [label = "A"];
0 -> "2" [label = "B"];
}

digraph trees {
  subgraph t {
    0 -> "1" [label = "A"];
    0 -> "2" [label = "B"];
  }
  subgraph u {
    Animal -> Cat [label = "feline"];
    Animal -> Dot [label = "canine"];
  }
}

Tasks

You have two subprojects you need to build:

  1. a scanner or lexical analyzer for DOT that breaks up the input stream of characters into vocabulary symbols called tokens: DOTScanner.java
  2. a DOT parser that recognizes the grammatical structure, producing errors upon syntactically invalid input: DOTParser.java

In the attachments to this page, you'll find TokenStream.java and Token.java support code.

To test each of the tasks, notice that there is a main() in each of them so you can simply run Java on that class. For example here is the meat from DOTScanner's main:

DOTScanner scanner = new DOTScanner(new InputStreamReader(System.in));
Token t = scanner.nextToken();
while ( t.type!=Token.EOF_TYPE ) {
	System.out.println(t);
	t = scanner.nextToken();
}

and the parsers:

        public static void main(String[] args) throws Exception {
                DOTScanner scanner = new DOTScanner(new InputStreamReader(System.in));
                DOTParser parser = new DOTParser(scanner);
                parser.digraph();
                System.out.println("OK");  // prints if no exception
        }

When run on a.dot above, you should get the following output (similar to the last project):

$ java DOTScanner < a.dot
[digraph:11]
[NFA:1]
[{:4]
[0:2]
[->:9]
["1":3]
[[:7]
[label:1]
[=:10]
["A":3]
[]:8]
[;:6]
[0:2]
[->:9]
["2":3]
[[:7]
[label:1]
[=:10]
["B":3]
[]:8]
[;:6]
[}:5]
$

Running the DOTParser should simply print OK if there are were no syntax errors:

$ java DOTParser < a.dot
OK
$

On the other hand, if you type in some erroneous input, you should see an exception:

$ java DOTParser
I like peanut butter
<CTRL-D>
Exception in thread "main" java.lang.Exception: token mismatch: expecting token type 11; found I
        at DOTParser.match(DOTParser.java:40)
        at DOTParser.digraph(DOTParser.java:48)
        at DOTParser.main(DOTParser.java:128)
$

Note: The program should read from standard input not a file argument. CTRL-D sends the end of file signal to the standard input; on a PC this is CTRL-Z.

As you build a parser and lexer, you should spend time thinking about how they are constructed. You should try to identify the relationship between the grammar provided in this assignment and code the you build. For example, you will note that for every rule in the grammar there is a method in the parser.

Simplified DOT syntax

digraph

cluster

subgraph

property

edge

nodelabel

value

optionlist

Simplified DOT grammar

digraph
	:	'digraph' ID cluster
	;

cluster
	:	'{' ( property ';' | edge ';' | subgraph )* '}'
	;

subgraph
	:	'subgraph' ID cluster
	;

property
	:	ID '=' value
	;

edge:	nodelabel '->' nodelabel optionlist?
	;

nodelabel
	:	value
	;

value
	:	ID
	|	STRING
	|	INT
	;

optionlist
	:	'[' property (',' property)* ']'
	;

ID	:	('a'..'z'|'A'..'Z')+
	;

INT	:	'0'..'9'+
	;

STRING
	:	'"' .* '"'
	;

WS	:	(' '|'\t'|'\r'|'\n')+ {skip();}
	;

Parse tree

This section shows the relationship between an input file and how the DOT grammar interprets it. The interior nodes of a parse tree are rule names and the leaves are input symbols.

digraph tree {
0 -> "1" [label = "A"];
0 -> "2" [label = "B"];
}

Submission

Part of your grade will be how well you follow the recursive descent parsing patter and how clean/simple your lexer is.  You may not use ANTLR for any part of this project.

You will create a jar file called dot.jar containing source and *.class files and place in your build directory:

https://www/svn/userid/cs652/dot/build/dot.jar

I will run your code by executing the following:

$java -cp "dot.jar:$CLASSPATH" DOTParser < test.dot

You can use the svn account for development of the software too if you would like, but I will only be looking at your jar file in the build directory.

For more information, see svn in CS601. Naturally you will have to substitute cs652 for cs601.