In lab Lexer for XML we read in some simple XML for a catalog of books and tokenized it. Now, we want to build a tree of the input and check to see if it's well-formed. So, for example, for input:
<catalog> <book> <id>101</id> <genre>Computer</genre> <author>Jim Cortez</author> <title>XML for dummies</title> <price>44.95</price> <description>An in-depth look at creating mashed potatoes with XML.</description> </book> <book> <id>102</id> <author>George Bush</author> <title>I'm the decider</title> <genre>Fantasy</genre> <price>0.95</price> <description>I like milk and cookies.</description> </book> </catalog>
We want to create a tree looks like:

The output of your test rig should be the text equivalent with OK indicating well-formedness:
$ java XMLAST < catalog.xml
OK
( <catalog>
( <book> (<id> 101) (<genre> Computer) (<author> Jim Cortez)
(<title> XML for dummies) (<price> 44.95)
(<description> An in-depth look at creating mashed potatoes
with XML.)
)
( <book> (<id> 102) (<author> George Bush) (<title> I'm the decider)
(<genre> Fantasy) (<price> 0.95) (<description> I like milk and cookies.)
)
)
Upon invalid input, the program should indicate what the problem is:
$ java XMLAST
<catalog>
<book>
<id>101</id>
</catalog>
error: expecting </book>, found </catalog>
error: missing end tag for <catalog>
(<catalog> (<book> (<id> 101)))
$
Support code
You'll need your solution to the previous XML lab as a foundation for this lab: TagScanner.java, Token.java, TokenStream.java. Now, you need to build XMLAST.java, which has the main method that will have your tree building and well formedness check. For your convenience, here is your homogeneous AST node that you will use for both tag and text nodes:
import java.util.ArrayList; import java.util.List; public class XMLNode { XMLNode parent = null; // if null, we're the root Token token; // the payload; if null, we're a list List<XMLNode> children; public XMLNode(XMLNode parent) { this.parent = parent; } public XMLNode(XMLNode parent, Token payload) { this.parent = parent; this.token = payload; } public void addChild(XMLNode node) { if ( children==null ) children = new ArrayList<XMLNode>(); children.add(node); } boolean isNil() { return token==null; } public String toString() { return token.getText(); } public String toStringTree() { if ( children==null || children.size()==0 ) { return this.toString(); } StringBuffer buf = new StringBuffer(); if ( !isNil() ) { buf.append("("); buf.append(this.toString()); buf.append(' '); } for (int i = 0; children!=null && i < children.size(); i++) { XMLNode t = children.get(i); if ( i>0 ) { buf.append(' '); } buf.append(t.toStringTree()); } if ( !isNil() ) { buf.append(")"); } return buf.toString(); } }
Here is the infrastructure for your main program:
import java.io.*; import java.util.Stack; public class XMLAST { public static void main(String[] args) throws IOException { XMLNode root = null; XMLNode p = null; // ptr used to create tree Stack<Token> tags = new Stack<Token>(); TagScanner scanner = new TagScanner(new InputStreamReader(System.in)); Token t = scanner.nextToken(); boolean error = false; while not EOF token { switch ( t.getType() ) { case TagScanner.BEGIN_TAG_TYPE : // create new child node for this tag // set root if null // else add new node to p as child // p points to the new node // push token onto stack break; case TagScanner.END_TAG_TYPE : String justName = t.getText(); // remove "</" on front and ">" on back justName = justName.substring(2,justName.length()-1); // move p back up a level in tree // pop token off stack // compare its text minus <...> to justName // if they are not the same, emit: // "error: expecting </tagname>, found current token" error = true; } break; case TagScanner.TEXT_TYPE : // add new text child to p break; } t = scanner.nextToken(); } if ( tags.size()>0 ) { // if we still have stuff on the stack, print the tokens // one at a time, emitting error. Walk tags stack from // size-1 .. 0, print // "error: missing end tag for tagname" } } // if no error print "OK" System.out.println(root.toStringTree()); // print tree } }
Add Comment