How Antlr Parsers are incorporated into Microfocus Revolve

James O'Connor

 

            Before Antlr, Revolve was a production COBOL analysis tool written in C and used Abraxas’s LEX/YACC.  Revolve was born in 1989 and grew with LEX/YACC until 2002.  Through those years, Revolve parsed Assembler, PL/I, JCL (Job Control Language), SQL, ECL, and other mainframe formats.  Tools such as “Impact Analysis” and “Code Flow” were based on a database of tuples created from a one pass parser.

 

            Where possible we patch the LEX/YACC code to correct bugs and accommodate new requests.  For ambitious projects and cases where the old grammars can’t be patched any longer, we have turned to Antlr.  Three of the more interesting uses are described below.

 

 

Case I: Revolve’s SQL grammar needed to be rewritten. 

Q: If you need to restart anyway, why not restart with Antlr? 

A: Because your base grammar (COBOL, PL/I, Assembler) is still in LEX/YACC.  This mere formality did not stop us. 

Q: How does the process work?

 

            1. Cobol or PL/I grammar in LEX/YACC sees “EXEC SQL” embedded in the native language.

            2. Strangle the LEXer to leave the next token alone?  Not necessary. We use the LEX lexer as a preprocessor and the basis for the Inputstream.

            3.  The LEX lexer acts as the InputBuffer to the AntlrLexer which feeds the Antlr SQL Parser.

            4. The “END EXEC” or semicolon that ends the statement needs special consideration.

            It is a rather amusing combination of old Revolve LEX/YACC world meets Antlr’s C/C++ library.

 

            Case II: Job Control Language needed native parsing in a new Java Tool.

 

            JCL has constructs known as symbolics.  This example should be clear.

 

LABEL1          SET     VAR1=DOG

LABEL2          SET     VAR2=CAT

LABEL3          DD       TYPE=&VAR1                                                           ANY TEXT AFTER A SPACE IS CONSIDERED COMMENT

LABEL4          DD       TYPE=&VAR1,ANOTHERTYPE=&VAR2               ANY TEXT AFTER A SPACE IS CONSIDERED COMMENT

 

            It’s obvious what the intent is.  We have a strange convention in JCL to set symbolics to eliminate parameters.  Consider what happens when VAR1 contains a space.

 

LABEL1          SET     VAR1=’DOG  Make everything a comment‘

LABEL4          DD       TYPE=&VAR1,ANOTHERTYPE=&VAR2              

 

The resultant text becomes

 

LABEL4          DD       TYPE=DOG  Make everything a comment,ANOTHERTYPE=&VAR2          

 

The ANOTHERTYPE=&VAR2 becomes masked by the construct before it.  What a mess.

 

We rolled a bunch of neat features to handle the symbolic processing. 

  1. Traced Text – This class keeps track of the current state of string of text and where it came from.
  2. LineInputStream – Read a line from a file
  3. SymbolicInputStream – Replace all symbolics in a line.
  4. JCLLexer, JCLParser – Coordinate the lookahead with the symbolic replacement.

 

 

Case III: Java Cross Referencer

      The Alpha version of the Java Cross Referencer was put on Antlr.org in January, 2004.  It has grown into a production tool.  The structure is similar to the original “head start”.  Make passes through the AST gathering information into Scopes.  There are a number of major obstacles to overcome in making the Microfocus version of the Java Cross Referencer.  The first was the addition of Jar files into the “searchable” classes.  Overloaded methods and methods in a class hierarchy was a fun puzzle to solve.    Signatures?  Just representing methods with Signatures was necessary.  Keeping all scopes “available” needed a big hashtable and WeakReferences.   

 

      Hope that gives you some idea.  The Java Cross Referencer description is not complete.  All for now.