Dashboard > USF Computer Science 652 - Programming Languages > ... > Symbol tables > Forward references solution
  USF Computer Science 652 - Programming Languages Log In | Sign Up   View a printable version of the current page.  
  Forward references solution
Added by Terence Parr, last edited by Terence Parr on Feb 19, 2008  (view change)
Labels: 
(None)

Forward references are a hassle, but are almost certainly required in some form for useful programming languages. The site spec issue in this question implicitly lets you reference as-yet undefined items. On the other hand, some languages explicitly make forward declarations like Pascal:

procedure test(var i:integer); forward;
...
procedure x;
begin
  test(42);  // ref to test
end;
...
procedure test
begin
  ...
end

or C (extern says variable is defined elsewhere in the program, possibly in the same source code file):

extern int size; //decl
void f() {
   size = 3;
}
int size=4; //def

When you hear "forward reference" think "damn, two-or-more passes required." In other words, you have to record the fact that you have seen a reference and then later, after all declarations have been seen, go back and check to make sure everything is ok. So, once through the input to record definitions and then another pass to check references. Or, you can do a pass to collect a list of forward references and then another pass over that list to match up the forward references with the item definitions.

In the Pascal example above, you would put procedure test into the global scope's symbol table and then, upon reaching the end of the file or at the end of def of test, update references to test to point to the real definition.

Typically you will track a list of forward references so you don't have to walk the entire input stream or tree again. This amounts to a list of AST nodes usually.

Ok, turning to our config file problem. You reference main and meditation before they are defined. You have to make a list of such items instead of immediately checking them.

family jguru {
  sites = { main, meditation };
  ...
}

So instead of

sites
    :    'sites' '=' '{'
             ID {sites.get($ID.text)!=null}? (',' ID {sites.get($ID.text)!=null}?)*
         '}'
    ;

You would do this:

{
List<Token> siteRefs = new ArrayList<Token>();
}
...
sites
    :    'sites' '=' '{'
             ID {siteRefs.add($ID);} (',' ID {siteRefs.add($ID);})*
         '}'
    ;

The site spec definitions proceed as before, defining hash table sites. Then the code that invokes the parser would walk the reference list doing the check:

for (Token refToken : siteRefs) {
    String name = refToken.getText(); 
    if ( sites.get(name)==null ) {
        int line = refToken.getLine();
        System.err.println("Site "+name+" on line "+line+" does not exist");
    }
}

Notice that I'm tracking the token objects not just the name of the reference so that I can provide the line number during error messages.

In the case of the assembler, there is a trick called backchaining that is particularly efficient memory-wise, but a list of references for each forward label is an efficient-enough and obvious solution.

If you have both forward and backward references, you do the normal resolution of defined items and track forward references.

Cool trick: Sometimes you can define a dummy symbol table entity that means "undefined forward reference". Later when you see the definition, you only have to update the dummy entity--all references would then see the real item by the end of the file. Ahhh... the glory of indirection

The most general and straightforward solution is simply to walk the entire input twice, once to get all of the definitions and again to test all the references for semantic validity. In this case you would typically have a parser build an AST and not do any symbol table manipulation (but you could have the parser do the definition phase). The definition and reference phases would be tree walkers (derived from tree grammars in ANTLR's world) which differ only in their actions because the tree structure is the same. The first time through doing definitions, you are pushing and popping scopes as you enter and exit lexical scopes. The second time through, however, you are not building the symbol table. You do not have access to the scopes you create a the last time unless you save them in the tree somehow. My favorite technique is to simply have a scope pointer inside tree nodes associated with scopes such as class definitions, methods, and the BLOCK node for {...} blocks (local scopes). As your second pass traverses the tree, keep setting a "current scope" pointer to the scope from the most recent scope-defining AST node. Be careful to treat the current scope pointer as a stack so that when you leave a scope, the current scope gets reset to what it was previously.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.1 Build:#806 May 06, 2007) - Bug/feature request - Contact Administrators