Building a student project compiler means struggling through the construction of a code generator among other things. For commercial compilers, the code generator is actually one of the smaller components as it is usually broken into an optimizer(the hard part and important part) and an instruction selector. Just as compiler engineers use parser generators ( compiler-compilers) to build parsers, _code generator generators_make a huge difference in productivity.
The goal of this lab is to introduce you to the notion of a BURS( Bottom-Up Rewrite System); talking about JBurg
in particular. I provide you with a simple front end that builds trees and you need to fill in a JBurg description to generate code.
Given input:
int i;
int x;
i=1;
x=0;
x=x+i;
You will generate the MIPS-inspired RISC code (comments indicate the source AST):
; ( int i )
.comm i,4
; ( int x )
.comm x,4
; ( = ( & i ) 1 )
la r1,1
la r2,i
sw r1,(r2)
; ( = ( & x ) 0 )
la r3,x
sw r0,(r3)
; ( = ( & x ) ( + ( * ( & x ) ) ( * ( & i ) ) ) )
la r4,i
lw r5,(r4)
la r6,x
lw r7,(r6)
add r8,r7,r5
la r9,x
sw r8,(r9)
Let me demonstrate just how important the optimizer is. An optimizer would do data flow analysis and realize that the code could be simplified to:
The machine code would then be the following greatly reduced version:
; ( int x )
.comm x,4
; ( = ( & x ) 1 )
la r1,1
la r2,x
sw r1,(r2)
Background
A code generator generator allows you to map intermediate form trees to machine instructions via tree grammars similar to ANTLR's tree grammars. To port your compiler to a different machine, all you have to do is build another grammar with code snippets that emit machine code. One of the finest examples of how to write a decent compiler is LCC, A Retargetable Compiler for ANSI C
. The book provides descriptions of a code generator generator and specifications for 3 architectures: MIPS, SPARC, x86.
ANTLR as a code generator generator
To get the idea, consider the following simple ANTLR tree grammar that generates code for PLUS subtrees:
expr returns [int r=0]
{int r1=0, r2=0;}
: #(PLUS r1=atom r2=atom)
{r=getRegister(); emit("add r"+r+",r"+r1+",r"+r2);}
;
atom returns [int r=0]
: i:INT {r=getRegister(); emit("la r"+r+","+i.getText());}
;
Each rule returns the register in which the generated code will place the result. Calling rule expr on tree (PLUS 3 0) would result in:
la r1,3
la r2,0
add r3,r1,r2
Now, assume r0 is always 0. You should really generate
(We'll ignore the fact that a real compiler would "fold" these constants to a single value of 3).
How do you change things to generate the more efficient rule? Try adding another alternative to rule atom :
expr returns [int r=0]
{int r1=0, r2=0;}
: #(PLUS r1=atom r2=atom)
{r=getRegister(); emit("add r"+r+",r"+r1+",r"+r2);}
;
atom returns [int r=0]
: i:INT {r=getRegister(); emit("la r"+r+","+i.getText());}
| INT {r=0;} ;
The only problem is that you have created an ambiguous grammar: ANTLR could match an INT node to either alternative. Plus, ANTLR doesn't know that you intend the second alternative to match only for value 0. Add a semantic predicate to disambiguate the rule:
atom returns [int r=0]
: i:INT {r=getRegister(); emit("la r"+r+","+i.getText());}
| {LT(1).getText().equals("0")}?
INT {r=0;} ;
Now ANTLR is happy. Another way to handle this is to treat 0 as a special case of the INT token. Have your parser create a node of ZERO as well as INT so that you can write atom this way:
atom returns [int r=0]
: i:INT {r=getRegister(); emit("la r"+r+","+i.getText());}
| ZERO {r=0;} ;
ANTLR is a (barely) acceptable instruction selector for RISC type machines as RISC machines are designed specifically to provide one way of doing any particular thing. Further most non-floating-point instructions have the same cost.
JBurg
JBurg also accepts a grammar, but it deliberately allows for ambiguous grammars. You will provide a rule for each instruction in your CPU, which almost always leads to more than one way to translate a given tree to machine code. This is particularly true for CISC instruction sets. (Recall that an ambiguous grammar is one that can produce more than one derivation tree for a single input stream). You specify a constant integer cost associated with matching each rule; that is, a cost for emitting each instruction. JBurg generates a tree parser from your rules and costs that knows all possible ways that your grammar can cover(that is, match) input trees and the cost of choosing each particular path. When presented with an actual tree to match, the generated tree parser does two passes:# walks the your tree bottom-up, choosing the minimal cost derivation.# execute the actions associated with the rules in the derivation tree (top-down).
JBurg does not actually create a derivation tree just like ANTLR generated parsers do not create a derivation tree while parsing text.
JBurg descriptions looks like ANTLR tree grammars except that the parentheses are in a different spot (prefix vs postfix notation):
reg : #(ADD r1:reg r2:reg) {...emit instruction(s)...}
and burg and friends use
reg : ADD(reg r1, reg r2) : 1 {...emit instruction(s)...}
where : 1 implies cost 1. Costs usually are just relative numbers but can mean "how many instructions were required?"
For our lab, we'll assume that the return value of most rules is an Integer indicating the register in which the instructions place the resulting value.
Another kind of rule is a _chain rule_that just lets you make rules equivalent to another, sometimes emitting code. For example, you might want to say that a character is like an integer, but it costs you an instruction:
int_const = char_const(void) : 1 {emit("cvci");} char_const = CHAR_LITERAL(void) : 1 {...}
The first rule of a JBurg description is the start rule, or highest level "goal". In our case it will be stmt .
Note that JBurg relies on constant costs so that it can do dynamic programming at compile-compile time to figure out the set of all possible derivation trees and their costs. TWIG, another code generator generator, allows computations rather than constants for costs. TWIG is necessarilymore expensive at code generation time because it must do the dynamic programming on the fly.
Code generator
For this lab, I have provided you with a parser/AST constructor that generating ASTs for use in your code generator. The code generation tasks is very small so that you can learn about JBurg and build something with it in about 2 hours.
Here are the parser rules from the[parser
grammar|templates/jburg/simple.g]:
prog: (stmt)+
;
stmt: assignment
| decl
;
decl: "int"^ ID SEMI!
;
assignment!
: id:ID eq:EQUALS e:expr SEMI!
{#assignment = #(#eq, #(#[ADDR,"&"],#id), #e);}
;
expr: atom ((PLUS^|MINUS^) atom)* ;
atom: INT
| ZERO
|! id:ID {#atom = #(#[IND,"*"],#(#[ADDR,"&"], #id));}
;
You must deduce what the trees look like.
Making the JBurg description
First, here are the instructions you may generate:
| instruction |
meaning |
| .comm <var>,4 |
Define a 4-byte word for var. |
| la r<n>, <const> |
Load const into register n. |
| la r<n>, <var> |
Load <var> into register n. |
| lw r<n>,(r<m>) |
reg[n] = memory[reg[m]] |
| sw r<n>,(r<m>) |
memory[reg[m]] = reg[n] |
| add r<x>,r<n>,r<m> |
reg[x] = reg[n]+reg[m] |
| sub r<x>,r<n>,r<m> |
reg[x] = reg[n]-reg[m] |
Now, you must complete the[JBurg
description|templates/jburg/simple.jburg].
You will need the[main program|templates/jburg/Main.java]program that also defines two methods of interest: emit(String) and getRegister() :
...
Integer r = Main.getRegister();
Main.emit("mov r0"+","+r); ...
There will be only one ambiguity in your tree grammar derived from the fact that r0 is always 0, hence, there is no need to do a
Instructions can use r0 dirctly. You must get your code generator to handle the special case properly. Your description will try to get constants into registers like this:
reg = constant : 1 {...}
...
constant = INT(void) : 0 { return null;}
As you can see, it will cost you 1+0 or 1 instruction to load a constant into a register. Add another reg rule that gets 0 (node ZERO ) into a register with no cost.
Building and running
To have JBurg generate your code-generator for inclusion in your program, add this in your path /home/submit/cs652/jburg.jar and then do the following to generate Gen.java :
$ java jburg.parser.JBurgMain simple.jburg Gen.java
Then compile and run
$ javac *.java
$ java Main
int i;
i = 1;
; ( int i )
.comm i,4
; ( = ( & i ) 1 )
la r1,1
la r2,i
sw r1,(r2)
Please run test.simple into your program. You need to get the output given at the head of this document.