Dashboard > USF Computer Science 652 - Programming Languages > ... > Symbol tables > Data aggregates solution
  USF Computer Science 652 - Programming Languages Log In | Sign Up   View a printable version of the current page.  
  Data aggregates solution
Added by Terence Parr, last edited by Terence Parr on Feb 19, 2008  (view change)
Labels: 
(None)

We've seen the simple nested scope structure for global, parameter, and local simple entities (variables and functions) in C. Adding C struct definitions to the mix essentially adds another scope. The new scope differs from previous scopes in three ways:

  1. it has a name. You can access symbol f within scope named s (s is a struct).
  2. it is persistent; you can't throw it out once you leave the struct definition since references to s.f will need to access f within s's struct scope.
  3. it is not nested (well, let's assume no nested struct definitions for now). The scope is properly associated with the type definition not nested within the global outer scope nor within a function executable code scope.

Reference the following code:

struct aggr {
  int x;
  float y;
};

void foo() {
  struct aggr test;
  test.x = 1;
}

Type specifiers. Previously we examined creating simple variables of type int and so on. The VariableSymbol stored in the table would point to some kind of type specifier so you know it's an int not a float , for example. For each predefined type in the system, you would have objects like IntType or IntTypeSymbol , ... For consistency all of these predefined symbols are just singleton instances of Symbol subclasses.

For "int x;" in some scope, that portion of your symbol table would look like this:

Functions are done similarly except the type pointer would really be the return type.

User-defined types. So much for runtime entities: variables and functions. To handle data aggregates, we must consider user-defined types, which are like compile-time entities. If the program says "struct aggr", you have to be able to find it in the current or enclosing scopes. So, when you see a definition for aggr, create a StructSymbol just like you would for a variable definition. The type pointer of this symbol would be null as it is a type. Create an instance of StructSymbol for each new struct definition.

For completeness, notice that

typedef int myint;

is handled easily by making a TypedefSymbol with a name of myint whose type field points to IntTypeSymbol. This is like an alias. References such as "myint x;" are easily seen as having type int by looking up myint and asking for the type specifier.

Here is what your symbol table object hierarchy could look like with our current understanding:

How do scopes interact with symbol table? In order to properly handle scopes, we need to abstract the idea that many different symbols have scopes associate with them such as functions and structs:

public interface Scope {
    /** What is the enclosing scope? */
    public Scope getEnclosingScope();

    /** Return a Map<String,Symbol> of all contained members */
    public Map getMembers();

    /** Look up name in this scope or in enclosing scope if not here */
    public Symbol resolve(String name);
    
    /** Remove a name from this scope */
    public Symbol remove(String name);
    
    /** Define a symbol in the current scope */
    public Symbol define(String name, Symbol sym);
}

Using this interface, we need to create an abstract superclass of all symbol objects that have scopes associated with them called ScopedSymbol.

public abstract class SymbolWithScope extends Symbol implements Scope {
  ...
}

Even local scopes will be a subclass. Here is a more proper Symbol hierarchy:

The local scope will have a parent pointer to the enclosing scope--its members are the local variables defined in that scope. A function symbol's scope will be the arguments; a local scope will exist for each function to contain the variable definitions and its parent pointer will point to the function scope.

The global scope can actually just use LocalScope as it's a list of symbols.

Adding to the scope. Once you have the new concept of StructSymbol , upon entry to a "struct aggr", create a new StructSymbol with name aggr and make it the current scope. Add this StructSymbol to the global scope.

While walking aggr's fields, add symbols as you would to a normal scope. The only difference is that the "current scope" points at a scope for which you have a persistent pointer. As with a function end "}", you pop the scope when you see the end of the struct definition.

The following diagram illustrates the scopes for the associatd code: the normal global scope, the arg scope for function foo, the local scope for method foo, and also the scope associated with structure aggr. Because the struct scope does not ever get thrown out, you can always lookup a field of that struct in any method.

Source code symbol/type system
struct aggr {
  int x;
  int y;
};

void foo(int i) {
  struct aggr test;
  test.x = 1;
}

The dashed lines represent type pointers. Because foo has no return type (it's void) it does not point at a type symbol.

Note that the struct has no parent pointing to the global scope as references not found in the struct do not get resolved in the global scope.

Looking up field names. Variable and function references are looked up using the normal stack of scopes as before. When you see test.x, however, you have more work to do. First, you look up test in the current scope. If found, you ensure that it is a StructSymbol and then look up x in its scope.

Nested aggregates. What do you do for nested structs like:

struct aggr {
  int x;
  float y;
  struct nested1 {
    int z;
  } n1;
  struct nested2 {
    int blort;
  } n2;
};

struct aggr test;

and references like "test.n1.z"? The "." operator is left-associative so you look up test.n1 before asking for z. The tree grammar will have:

dot returns [Symbol type]
    : ^( DOT expr ID
         {
         Symbol s = ((ScopedSymbol)$expr.type).resolve($ID.text);
         $type = s.type;
         }
       )
    ;

or something like that. expr will usually have a ScopedSymbol result type in this case and then the ID can be looked up inside that type. In general when doing translations that need type information, your expression rules will return a type spec:

expr returns [Symbol type]
    : ...
    ;

Please note that aggr has two nested structs, which represents a tree of 3 scopes:

Because there can be no code inside structs, there is no parent pointer from the nested struct scope to enclosing struct scope. You'll never try to resolve, say, y within nested1.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.1 Build:#806 May 06, 2007) - Bug/feature request - Contact Administrators