Supporting document for ``LL(*): The Foundation of the ANTLR Parser Generator''

October 28, 2010

When writing LL(*): The Foundation of the ANTLR Parser Generator, we collected data about open-source projects using ANTLR grammars and also collected grammar analysis data about six large grammars. This document describes our methodology. See the paper for full detail/results. We also include material here that did not fit into the 10 page submission paper.

Actions in open-source grammars

We found 110 grammars and removed duplicates and anything else that would blur the statistics. Result:

22/89 = .247 No actions
67/89 = .752 Actions

41 grammars are ANTLR v2 and 48 are ANTLR v3. 43/89 grammars are tree grammars.

Open-source grammar review to discover those with/without actions (Format is filename 0|1. 1 indicates the grammar was deemed to have actions.)

Methodology used to examine open-source grammars.

Project directory names (we cannot publish actual projects for license reasons). We did "antlr grammar" searches at google code and sourceforge. A search for these names yields the projects. We pulled 110 grammars out of these roughly 30 projects.

EduXcomp JUnC Noop abtools-GRAMMAR booggie-1.0.1

cesta-0.1.2 dsl4j funky-1.3.00 gwt-xml-editor

highlighter jai-tools kconfig-g koopa laurentpetit-ccw-b432012

makeme-0.02 modsl-read-only qvtparser_1.0.0 qvtrel2op_1.0.0 ontostep-plugin-0.9.0

php-grammar-in-antlr sandbox spapper_v1.2 sparkle-g spydoc_1_0_pre-alpha

svparser v2kparse-1.7 xpa xqgrammar xruby-0.3.3a-src

yajt-rc1-0.0.1-src

ANTLR download statistics

Here are snapshots from Google Analytics (data only includes January 9, 2008 - October 28, 2010):

Empirical grammar and parser runtime analysis

Links to sample grammars

ANTLR Example Java.g (BSD license)
RatsC converted from Rats! (GPL license)
RatsJava converted from Rats! (GPL license)
Commercial (non open-source) VB.NET, TSQL, C# grammars used with permission from Temporal Wave, LLC.

Links to sample input

Java input, RatsJavaParser.java, is ANTLR-generated parser for RatsJava.g, which we derived from GPL Rats! (Empirical results in paper were run w/o GPL license on top of this file.)
C input, pre_javaParser.c, is ANTLR-generated parser for Java.g run through C preprocessor
LingToSqlSamples.cs taken from Microsoft C# Samples
TSQL sample cobbled together from Microsoft Transact-SQL Reference; includes table creation, complicated stored procedures, updates, insertions, and queries. Copyright prevents reproduction here.
Northwind.vb taken from Microsoft VB.NET Samples

Here are the raw results of static and runtime analysis for the 6 sample grammars:

Theoretical results

The paper focuses on the practical issues of nondeterministic parsing and introduces our algorithm for constructing LL(*) parser lookahead DFA. We could not fit any theoretical results, but in the following section, we show that we can convert any PEG to an equivalent ANTLR grammar. With the addition of unrestricted semantic predicates, we can reach further into the context-sensitive languages than PEGs can, however.

theoretical-results.pdf

Discovery of visible semantic predicates

The formal semantics of predicated grammars and analysis algorithm in the submitted paper require that disambiguating predicates appear at the left edge of ambiguous productions. This is cumbersome in practice and forces programmers to duplicate predicates. Fortunately, grammar analysis can automatically discover and hoist predicates from productions further down the derivation chain into parsing decisions without predicates.

Here is a description of visible predicates and modified DFA construction algorithm.

ANTLR v3 LL(*) Lookahead DFA Construction Implementation

Class NFAToDFAConverter in the ANTLR 3.3 distribution has a Java implementation of this algorithm.

EduXcomp	JUnC	Noop	abtools-GRAMMAR	booggie-1.0.1
cesta-0.1.2	dsl4j		funky-1.3.00	gwt-xml-editor
highlighter	jai-tools	kconfig-g	koopa	laurentpetit-ccw-b432012
makeme-0.02	modsl-read-only	qvtparser_1.0.0	qvtrel2op_1.0.0	ontostep-plugin-0.9.0
php-grammar-in-antlr	sandbox	spapper_v1.2	sparkle-g	spydoc_1_0_pre-alpha
svparser	v2kparse-1.7	xpa	xqgrammar	xruby-0.3.3a-src
yajt-rc1-0.0.1-src