Dashboard > USF Computer Science 652 - Programming Languages > ... > Projects > C subset compiler
  USF Computer Science 652 - Programming Languages Log In | Sign Up   View a printable version of the current page.  
  C subset compiler
Added by Terence Parr, last edited by Terence Parr on Apr 21, 2008  (view change)
Labels: 
(None)

Introduction

In this final project, you will create a real compiler for a subset of the C programming language by translating the C code into the LLVM SSA intermediate representation (IR). You will need to deal with arrays but very few of the operators/statements and only int types. The features are:

  • int type
  • functions with void or int return type and multiple arguments
  • if statements
  • while statements
  • return statements
  • printf("s", n) method; allow this simplified version with format string and single integer argument. This is the only place a string is allowed (i.e., you can't use it in a general expression nor as an argument to a user-defined function)
  • operators: +, -, ==, !=
  • arrays
  • globals and locals

You do not have to mess with pointers.

int i;
void f() { i = 3; }
int i[10];
int f() { int b[5]; b[0] = i[0]; return i[3]+2; }
int f(int n) {
	if ( n==0 ) return 1;
	else return n+f(n-1);
}
void f() {
	int i;
	i = 0;
	while ( i!=10 ) {
		printf("%d\0A", i);
		i = i+1;
	}
}

Here is an example translation. Input:

int g(int x) {
	x=x+1;
	printf("%d\0A", x);
	return x;
}

yields

@s = internal constant [4 x i8] c"%d\0A\00"
declare i32 @printf(i8 *, ...)

define i32 @g(i32 %x_arg) {
	%x = alloca i32
	store i32 %x_arg, i32* %x
	%ps = getelementptr [4 x i8]* @s, i64 0, i64 0
	%t5 = load i32* %x
	%t6 = add i32 1,0
	%t4 = add i32 %t5, %t6
	store i32 %t4, i32* %x
	%t7 = load i32* %x
	call i32 (i8 *, ...)* @printf(i8* %ps, i32 %t7)
	%t8 = load i32* %x
	ret i32 %t8
	ret i32 0
}

Tasks

First, get LLVM running on a computer you have access to. Most likely you will have to build it from the source code, which is a good exercise for you. Please make sure you are using version 2.2 (the latest).

I am providing the grammar for the C subset so that we can all work on the exact same input. Your goal is to emit LLVM IR in the form of standard output with in the main program/class called CC. The full toolchain should look like this

java CC < t > t.ll
llvm-as -f t.ll # translate the IR to bitcodes t.bc
# or to remove memory load/stores, use: llvm-as < t.ll | opt -mem2reg > t.bc
# to see optimized SSA, do this: llvm-dis t.bc
llc -f t.bc # compile bitcodes two assembly code
gcc -o go main.c t.s # compile, assembly and link to t executable
./go # execute

Notice that you will need a main program that invokes whatever you compile:

#include <stdio.h>

extern void f(); // or whatever the signature is from your generated code
int main(int argc, char *argv[]) {
    f(); // call your generated code
}

Ignore invalid input---assume all input is correct.

Your translator class should be called CC (uppercase CC) and have the main method that reads C subset files from standard input.

Submission

You will create a jar file called cc.jar (lowercase cc) containing grammars, source, and *.class files and place in your lib directory:

https://www/svn/userid/cs652/proj7/trunk/lib

You can use the svn account for development of the software to if you would like, but I will only be looking at your cc.jar file in the lib directory.

Please bring a printout of any tree grammar you build and support code as well as the output code generated from all examples shown above.

Grading

Make sure your jar has the CC class in the default package with a main method so that I can test your code. I will run a number of examples that you have not seen through your project.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.1 Build:#806 May 06, 2007) - Bug/feature request - Contact Administrators