MyReadMe.txt Notes for C++ grammar file to generate ANTLR parser (in C++) 1. Past 2. Present 3. Future 1. Past This C++ grammar file was originally written and published in 1994 by, Authors: Sumana Srinivasan, NeXT Inc.; sumana_srinivasan@next.com Terence Parr, Parr Research Corporation; parrt@parr-research.com Russell Quong, Purdue University; quong@ecn.purdue.edu as VERSION 1.2 for use with PCCTS (The original C version of ANTLR). In 1997-1999 it was adapted for use in a project to analyse data flow in C programs by Lasitha Leelasena, Sue Black (blackse@lsbu.ac.uk) and David Wigg (wiggjd@bcs.org.uk). The generated parser was in C++ and all of our included statement code was in C. In 2000, in view of the fact that ANTLR had then been re-written in Java and any further development of PCCTS had been suspended, it was decided that we should convert our version of the C grammar for PCCTS into use with ANTLR. As all our included application code was in C it was decided to use the option to produce the generated parser in C++ to avoid the need to rewrite this application code as well. During 2001-2002 we were fortunate enough to have the services of a visiting tutor, Jiangu Zuo, from Jianghan University,Wuhan,China who carried out most of this work. However, this conversion was quite a lot more difficult than we had hoped and took us about a year to complete. We have tried to make a record of problems encountered and to give some solutions and this can be found at http://antlr.org/fieldguide/cppantlr/index.html. The most difficult problem concerned the lack of 'hoisting' in ANTLR which we were only able to overcome in the time available by copying the generated hoisting code from the PCCTS version into our new grammar file, hence some of the mysterious C++ statements at the beginning of a number of productions. I think Zuo also had some problems in using predicates. In August 2002 I reported that this grammar file would be published 'soon' when remaining problems had been cleared up and the grammar was fit to be published. In the event, for a variety of reasons, this was not achieved. So, in view of the number of requests being made for access to this grammar I agreed in February 2003 to it being published on the www.antlr.org website for general use under the usual terms, in the hope that interested users would let me know how it could be improved. Unfortunately, though it could handle C code and some C++ it was unable to handle namespaces and a lot of templates so left a lot to be desired. In September 2003 I supplied a much improved version which I called V2.0. This version was picked up by some users. A few problems were raised which have since been solved. Since then I have been concentrating on tidying up what had become a rather confusing system and trying to produce a cleaner, tidier and easier to understand system and also one easier to use in your application. No doubt I have not entirely succeeded yet, but I hope it is better than it was. I have introduced the idea of subclassing a users application code. I hope this clear separation of code will enforce a clear separation of code between the parser and the application and will enable users to take CPP_parser updates much more easily. If you feel the need to change the parser in any way I would be grateful if you could let me know. 2. Present, July 2004. I am using MSVC 6.0 under Windows ME and NT. I created a static source library for the antlr code (2.7.3) with some modifications as discussed below. I have called this latest version Version 3.0 published July 2004 Please note that it continues to be used to parse pre-compiled *.i files (with or without embedded #line directives (obtained by using the /P command in compilation when using MSVC) ). I include a small demonstration program, quadratic.i, which you could use to test the set up of your system. Although I cannot say it has been thoroughly tested it appears to parse support.i, CPPLexer.i and CPPParser.i, all of which contain a considerable quantity of included files containing a great deal of complex code. It should be noted that this version still handles scoping in a relatively simplistic manner but this does not appear to be a problem. To do this properly would entail a lot of work to update the antlr supporting code in dictionary.cpp etc. Briefly, all template parameter names are held in level 0, and all type names in level 1. All variable names are held in lower levels but continue to be deleted when they go out of scope. Each run should end with the following two statements, Support exitExternalScope scope now 0 Parse ended showing that the scope level had been returned to zero correctly. I have included a C++ syntax definition (grammar.txt) which appears to be up to date. If not, please let me know. Please address any problems you have with this version to me preferably with a cut down version of the problem code. Notes about running this version. * I am currently using antlr 2.7.3 * Note that the latest version of antlr for MSVC users may be on Ric Klaren's website at http://wwwhome.cs.utwente.nl/~klaren/antlr/ * The following type of warning produced during compilation of CPPLexer.cpp and CPPParser.cpp can be ignored, CPPParser.cpp(163) : warning C4101: 'pe' : unreferenced local variable * I have introduced a "statementTrace" feature in CPP_parser.g during testing which I have found useful. See CPP_parser.g . This can be set on (or off) by altering statementTrace in CPPParser.cpp and recompiling and linking only. With statementTrace set to 1 you get a list of statement types as they are detected from external_declaration and member_declaration in CPP_parser.g. With statement trace set to 2 you also get a record of each variable declared showing its name, scope level, and type (See list in CPPSymbol.hpp). The trace output will display but you should be able to place trace output in a trace file like this, ...\debug\CPP_parser program.i > program.trace I have found this feature useful for providing the ability to check the output from one run to another after making modifications to either the parser or the application code. Just keep your "standard" or "correct" version of the trace output in a separate "archive" file and use this to compare with the output of any updated version. You can do a file compare like this, ...\fc /n program_010704.trace program.trace * I have also implemented a dynamic trace facility by including some code in LLkParser.hpp and LLkParser.cpp called antlrTrace() before generating the static antlr library, as shown below. The advantage of this facility is that, by always generating with antlr tracing initially (using -traceParser etc.), antlr tracing can be switched on or off completely by changing the antlrTrace() statement in init() in CPPParser.cpp appropriately and recompiling and linking without also having to regenerate from the grammar file each time. It also enables tracing to be implemented on a more selective basis by including antlrTrace(true/false) statements in the grammar, though of course in this case it would entail regenerating the parser as well, but it would enable you to reduce the amount of trace output to a specific area. LLkParser.hpp private: // DW 060204 For dynamic tracing bool antlrTracing; public: // DW 060204 For dynamic tracing virtual void antlrTrace(bool traceFlag); LLkParser.cpp // DW For dynamic tracing void LLkParser::antlrTrace(bool traceFlag) { antlrTracing = traceFlag; } void LLkParser::traceIn(const char* rname) { traceDepth++; // DW For dynamic tracing if (antlrTracing) trace("> ",rname); } void LLkParser::traceOut(const char* rname) { // DW For dynamic tracing if (antlrTracing) trace("< ",rname); traceDepth--; } * I have also introduced MyCode.cpp with MyCode.hpp to demonstrate how your application code can be subclassed in CPPParser. You can, of course, delete, include and amend any of these to suit your application. However, I strongly recommend using this feature with this grammar as I think it will make it easier both for me to issue updated versions of the CPP_parser grammar from time to time and for you to accept and use them since the code for the parser and your application will be kept strictly apart. * I would be grateful if you could let me know if you need to correct anything in CPP_parser.g, support.cpp etc. * End of notes. 3. Future I would be grateful if any future user of this grammar would advise me and/or the e-mail group (antlr-interest@yahoogroups.com) direct of any improvement they have been able to make to this grammar, for the benefit of other users. In the meantime I will advise the group of any improvements and continue to update the master files as arranged with Terence from time to time. Thankyou. David Wigg Research Fellow Centre for Systems and Software Engineering London South Bank University London, UK. wiggjd@bcs.org.uk blackse@lsbu.ac.uk 1 July 2004