<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
        {font-family:"Arial Narrow";
        panose-1:2 11 6 6 2 2 2 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
        {mso-style-priority:99;
        mso-style-link:"Plain Text Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.5pt;
        font-family:Consolas;}
span.PlainTextChar
        {mso-style-name:"Plain Text Char";
        mso-style-priority:99;
        mso-style-link:"Plain Text";
        font-family:Consolas;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 92.4pt 1.0in 92.4pt;}
div.Section1
        {page:Section1;}
/* List Definitions */
@list l0
        {mso-list-id:1238512920;
        mso-list-type:hybrid;
        mso-list-template-ids:-311147880 -651894344 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
        {mso-level-start-at:0;
        mso-level-number-format:bullet;
        mso-level-text:-;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:.75in;
        text-indent:-.25in;
        font-family:Consolas;
        mso-fareast-font-family:Calibri;
        mso-bidi-font-family:"Times New Roman";}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoPlainText>On August 03, 2008 1:46 PM, Gavin Lambert wrote:<o:p></o:p></p>
<p class=MsoPlainText>> <o:p></o:p></p>
<p class=MsoPlainText>> At 06:34 4/08/2008, Foust wrote:<o:p></o:p></p>
<p class=MsoPlainText>> >Yes... it started out that way. But to
allow spaces to be part<o:p></o:p></p>
<p class=MsoPlainText>> >of a config value (read up to EOL), the
Lexer needs to honor<o:p></o:p></p>
<p class=MsoPlainText>> >state. (Place spaces in the HIDDEN channel
for all other cases<o:p></o:p></p>
<p class=MsoPlainText>> >- outside of a special config/preprocessor
rule).<o:p></o:p></p>
<p class=MsoPlainText>> <o:p></o:p></p>
<p class=MsoPlainText>> Are you hiding the EOLs as well? (Usually
they're lumped in with<o:p></o:p></p>
<p class=MsoPlainText>> whitespace.)<o:p></o:p></p>
<p class=MsoPlainText>> <o:p></o:p></p>
<p class=MsoPlainText>> If so, then you'll have to match everything in the
lexer anyway,<o:p></o:p></p>
<p class=MsoPlainText>> since the parser won't be able to see the EOL.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText>Good point. I had to change the syntax to read up to a parser-visible
terminator to get it to even <i>partially</i> work (but the whitespace was
still missing). Gathering up the tokens with += and calling toString(from, to)
returned a single value, including the original whitespace, but mysteriously
stopped the lexer from returning any more input (the rest of the file seemed to
be discarded), so I abandoned that method altogether:<o:p></o:p></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> /**<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> *
restores stripped whitespace to a range of tokens <o:p></o:p></span></p>
<p class=MsoPlainText style='text-indent:.5in'><span style='font-size:9.0pt'> *
(Only the first and last entries of the input are used).<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> *
@return String representing given range of tokens with reconstituted whitespace
in between.<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> */<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> private
String <span style='color:#548DD4'>concatTokens</span> (List matchedTokens)<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> {<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> int
from = ((CommonToken) matchedTokens.get(0)).getTokenIndex();<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> int
to = ((CommonToken) matchedTokens.get( matchedTokens.size() -
1)).getTokenIndex();<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> <o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> <span
style='color:#548DD4'>return ((CommonTokenStream) input).toString(from, to);<o:p></o:p></span></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'> }<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='font-size:9.0pt'><o:p> </o:p></span></p>
<p class=MsoPlainText>> As long as you<o:p></o:p></p>
<p class=MsoPlainText>> have something fairly distinctive to start matching
on, this<o:p></o:p></p>
<p class=MsoPlainText>> shouldn't be hard, and you shouldn't need to do any
parser->lexer<o:p></o:p></p>
<p class=MsoPlainText>> contortions. See how line and block comments
are implemented in<o:p></o:p></p>
<p class=MsoPlainText>> the examples.<o:p></o:p></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'>Yes, I use something similar,
if not identical, to the demos to parse comments that works quite well:<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:#548DD4'>LINE_COMMENT
: '//' ~('\r' | '\n')* NEWLINE+ { skip(); };<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:#548DD4'>BLOCK_COMMENT options {
greedy = false; } : '/*' .*
'*/'
{ skip(); };<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText>> <o:p></o:p></p>
<p class=MsoPlainText>> (And if you can modify the language you're parsing,
now would be a<o:p></o:p></p>
<p class=MsoPlainText>> good time to make it use a quoted string or similar
instead of<o:p></o:p></p>
<p class=MsoPlainText>> simply reading to EOL.)<o:p></o:p></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'>Yes, thank you for the
suggestion. I did have to resort to changing the terminator, but even that didn’t
solve the whitespace problem.<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'>I thought the whole point of a
Domain Specific Language was to make the task easy on the user – not on
the parser-generator. It seems that the issue is that what is intuitive to a
human may in fact be some chimera of two or more formal syntaxes. </span><span
style='color:#76923C'>Antlr does not handle this very well, forcing tokens to
be interpreted the same in every context.</span><span style='color:black'> But
since it allows interaction with the target language, there are likely several
ways to solve the problem. <o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'>I thought that the cleanest way
to read in a free-form config {…} block (not requiring quotes, or other
syntax that might, in fact be intended to be part of the config setting) is to
treat it as a separate language. I want to keep the syntax as simple as
possible and have no possibility of conflicting with any other part of the language.
So </span><span style='font-size:14.0pt;color:red'>I solved this particular
problem</span><span style='color:black'> by:<o:p></o:p></span></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo1'><![if !supportLists]><span style='color:black'><span
style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><span style='color:black'>using parser states<o:p></o:p></span></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo1'><![if !supportLists]><span style='color:black'><span
style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><span style='color:black'>a predicate on the ‘config’
rule to only recognize it if in the correct state<o:p></o:p></span></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo1'><![if !supportLists]><span style='color:black'><span
style='mso-list:Ignore'>-<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><span style='color:black'>Implement a simple
parser for just the block in question using regex in the target language:<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoNormal style='text-indent:.5in;text-autospace:none'><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>@members {<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:.5in;text-indent:.5in;text-autospace:
none'><span style='font-size:9.0pt;font-family:"Courier New"'>/** get any config
values specified in the config {} block */<o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'> HashMap<String,
String> </span><span style='font-size:10.0pt;font-family:"Courier New";
color:#7030A0'>config</span><span style='font-size:10.0pt;font-family:"Courier New";
color:black'> = new HashMap<String, String>();<o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'><o:p> </o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'> </span><b><span
style='font-size:10.0pt;font-family:"Courier New";color:#7F0055'>private</span></b><span
style='font-size:10.0pt;font-family:"Courier New";color:black'> </span><b><span
style='font-size:10.0pt;font-family:"Courier New";color:#7F0055'>void</span></b><span
style='font-size:10.0pt;font-family:"Courier New";color:black'> </span><span
style='font-size:10.0pt;font-family:"Courier New";color:red'>parseConfig</span><span
style='font-size:10.0pt;font-family:"Courier New";color:black'> (String
configDefs)</span><span style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'> {</span><span
style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
String[] lines =
configDefs.split(</span><span style='font-size:10.0pt;font-family:"Courier New";
color:#2A00FF'>"[\\r\\n]+\\s+"</span><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>);</span><span style='font-size:10.0pt;
font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
</span><b><span
style='font-size:10.0pt;font-family:"Courier New";color:#7F0055'>for</span></b><span
style='font-size:10.0pt;font-family:"Courier New";color:black'> (String line :
lines)</span><span style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
{</span><span
style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
String[]
part = line.split(</span><span style='font-size:10.0pt;font-family:"Courier New";
color:#2A00FF'>"\\s*:\\s*"</span><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>); // split on colon</span><span
style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
String
name = </span><span style='font-size:10.0pt;font-family:"Courier New"'>(part.length
> 0) ? part[0] : <span style='color:#2A00FF'>""</span><span
style='color:black'>;</span><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
String
value = </span><span style='font-size:10.0pt;font-family:"Courier New"'>(part.length
> 1) ? <span style='color:black'>part[1]</span> : <span style='color:#2A00FF'>""</span><span
style='color:black'>;</span><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'>
</span><span
style='font-size:10.0pt;font-family:"Courier New";color:#7030A0'>config</span><span
style='font-size:10.0pt;font-family:"Courier New";color:black'>.put(name,
value);</span><span style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'> }</span><span
style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:black'> }</span><span
style='font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'> }<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'>The grammar rules to 1) recognize
the “config” block first, and 2) make sure “config” is
not a keyword and can be used elsewhere in the grammar, looks like this:<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'> </span>start<span
style='color:black'> @init {</span><span
style='color:#FFC000'>allowConfig</span><span style='color:black'> = true;}<o:p></o:p></span></p>
<p class=MsoPlainText style='margin-left:.5in;text-indent:.5in'><span
style='color:black'>: </span><span style='color:#548DD4'>config</span><span
style='color:black'>? </span>objectDefinitions<span style='color:black'> EOF ;<o:p></o:p></span></p>
<p class=MsoPlainText style='margin-left:.5in;text-indent:.5in'><span
style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText style='margin-left:.5in;text-indent:.5in'><span
style='font-family:"Arial Narrow","sans-serif";color:black'>// only recognize config
block before Object Definitions<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'> </span><span
style='color:#548DD4'>config</span><span style='color:black'> :
{</span><span style='color:#FFC000'>allowConfig</span><span style='color:black'>
&& input.LT(1).getText().equalsIgnoreCase("config")}?=>
<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'> NAME
'{' </span><span style='color:#548DD4'>configBlockText</span><span
style='color:black'> '}' // block of
config settings (possibly empty)<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'> {
</span><span style='color:red'>parseConfig</span><span style='color:black'>($configBlockText.text);
}<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'> ;<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'> </span><span
style='color:#548DD4'>configBlockText</span><span style='color:black'> : ~'}'*
;<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'> </span>objectDefinitions<span
style='color:black'> @init {</span><span style='color:#FFC000'>allowConfig</span><span
style='color:black'> = false;} </span><span style='font-family:"Arial Narrow","sans-serif";
color:black'>// config block (and keyword) no longer recognized<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-family:"Arial Narrow","sans-serif";
color:black'> …<o:p></o:p></span></p>
<p class=MsoPlainText><span style='font-family:"Arial Narrow","sans-serif";
color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='color:black'>That was a lot simpler than
struggling with the Antlr lexer.<o:p></o:p></span></p>
<p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p>
<p class=MsoPlainText><span style='font-family:"Arial Narrow","sans-serif";
color:black'>Brent<o:p></o:p></span></p>
</div>
</body>
</html>