Overview of language implementation and supporting tools

Skip to end of metadata
Go to start of metadata

General strategies

All language implementation strategies are technically translators. For the next 3.5 months, your life is going to be all about:

All translators have a common preliminary step: language recognition. To emit different output for different input, a translator must distinguish between input sentences. Translators emit output y for input statement x; i.e., y = f(x). There may be many many intermediate steps between x and y:

Even the recognition process is broken down into multiple phases: preprocessing, lexing, parsing:

The token stream is sent off to a parser that build an intermediate representation.

Interpretation

Simple interpreters, that you might call syntax directed interpreters, are typically one pass interpreters that read the input and emit some output. A calculator is a good example of this. More complicated translators generally follow the following strategy:

  1. Translate source to an intermediate representation. Usually this intermediate representation is high-level enough to recover the original source. Common representations: tokens (BASIC), trees (LISP), bytecodes (Java), high-level instructions (Forth, PostScript)
  2. Walk intermediate representation once or more times executing actions

Java VM is stack machine with a byte code representation.

yields

f:
	iload_1
	iconst_1
	iadd
	ireturn

demo PostScript case study

Translation

demo Java profiling

Compilation

Compilation is a specific instance of a well-known class of translators that translate programming languages down to machine code. The term "compile" is used pretty loosely sometimes to include translation from source code to an intermediate representation; the javac compiler translates Java to byte codes.

demo LLVM

LLVM live demo

C code:

LLVM static single assignment (SSA) intermediate representation: (mul_add.ll)

define i32 @mul_add(i32 %x, i32 %y, i32 %z) {
entry:
  %tmp = mul i32 %x, %y
  %tmp2 = add i32 %tmp, %z
  ret i32 %tmp2
}

main.c:

Compile/execute:

llvm-as -f mul_add.ll 
llc -f mul_add.bc
as -o mul_add.o mul_add.s
gcc main.c mul_add.o
./a.out

emits r = 70

demo gcc and -S

$ gcc -S t.c   # generates t.s assembly code not t.o
$ cat t.s
	.file	"t.c"
	.text
.globl f
	.type	f, @function
f:
	pushl	%ebp ; save frame pointer register
	movl	%esp, %ebp ; stack pointer goes into frame pointer
	movl	8(%ebp), %eax ; get x into return register eax
	addl	$1, %eax
	popl	%ebp ; restore frame pointer
	ret
	.size	f, .-f
	.ident	"GCC: (GNU) 4.1.2 20070502 (Red Hat 4.1.2-12)"
	.section	.note.GNU-stack,"",@progbits

Then assembler translates the assembly code to machine code: t.o file.

$ as -o t.o t.s
$

Then a linker combines with a main program and standard libraries to make executable image, though in this case we have no main:

$ ld t.o
ld: could not find entry point "start" (perhaps missing crt1.o) for inferred arc
$

Java execution possibilities

Java initially was purely interpreted (after translation from Java code to byte codes). Now the story is much more complicated:

Tools

Tools we will use: ANTLRWorks, ANTLR, StringTemplate, JBurg, LLVM, DOT/Graphviz.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.