How Antlr Parsers are incorporated into Microfocus Revolve
James O'Connor
Before Antlr, Revolve was a production COBOL analysis tool written in C and used Abraxas’s LEX/YACC. Revolve was born in 1989 and grew with LEX/YACC until 2002. Through those years, Revolve parsed Assembler, PL/I, JCL (Job Control Language), SQL, ECL, and other mainframe formats. Tools such as “Impact Analysis” and “Code Flow” were based on a database of tuples created from a one pass parser.
Where possible we patch the LEX/YACC code to correct bugs and accommodate new requests. For ambitious projects and cases where the old grammars can’t be patched any longer, we have turned to Antlr. Three of the more interesting uses are described below.
Case I: Revolve’s SQL grammar needed to be rewritten.
Q: If you need to restart anyway, why not restart with Antlr?
A: Because your base grammar (COBOL, PL/I, Assembler) is still in LEX/YACC. This mere formality did not stop us.
Q: How does the process work?
1. Cobol or PL/I grammar in LEX/YACC sees “EXEC SQL” embedded in the native language.
2. Strangle the LEXer to leave the next token alone? Not necessary. We use the LEX lexer as a preprocessor and the basis for the Inputstream.
3. The LEX lexer acts as the InputBuffer to the AntlrLexer which feeds the Antlr SQL Parser.
4. The “END EXEC” or semicolon that ends the statement needs special consideration.
It is a rather amusing combination of old Revolve LEX/YACC world meets Antlr’s C/C++ library.
Case II: Job Control Language needed native parsing in a new Java Tool.
JCL has constructs known as symbolics. This example should be clear.
LABEL1 SET VAR1=DOG
LABEL2 SET VAR2=CAT
LABEL3 DD TYPE=&VAR1 ANY TEXT AFTER A SPACE IS CONSIDERED COMMENT
LABEL4 DD TYPE=&VAR1,ANOTHERTYPE=&VAR2 ANY TEXT AFTER A SPACE IS CONSIDERED COMMENT
It’s obvious what the intent is. We have a strange convention in JCL to set symbolics to eliminate parameters. Consider what happens when VAR1 contains a space.
LABEL1 SET VAR1=’DOG Make everything a comment‘
LABEL4 DD TYPE=&VAR1,ANOTHERTYPE=&VAR2
The resultant text becomes
LABEL4 DD TYPE=DOG Make everything a comment,ANOTHERTYPE=&VAR2
The ANOTHERTYPE=&VAR2 becomes masked by the construct before it. What a mess.
We rolled a bunch of neat features to handle the symbolic processing.
Case III: Java Cross Referencer
The Alpha version of the Java Cross Referencer was put on Antlr.org in January, 2004. It has grown into a production tool. The structure is similar to the original “head start”. Make passes through the AST gathering information into Scopes. There are a number of major obstacles to overcome in making the Microfocus version of the Java Cross Referencer. The first was the addition of Jar files into the “searchable” classes. Overloaded methods and methods in a class hierarchy was a fun puzzle to solve. Signatures? Just representing methods with Signatures was necessary. Keeping all scopes “available” needed a big hashtable and WeakReferences.
Hope that gives you some idea. The Java Cross Referencer description is not complete. All for now.