Command interpreter

Skip to end of metadata
Go to start of metadata

Introduction

In this project, you'll build an interpreter suitable for executing statically or dynamically typed object-oriented languages. Let's call it VML for Virtual machine language. You've seen a bytecode interpreter and now you'll implement a command-based interpreter. A command is just an object that represents an action (instruction in this case) to execute. The basic idea is to generate a list of command objects instead of an array of bytecodes. Each command object contains all of the arguments necessary to execute that instruction. Bytecodes, on the other hand, have their arguments stored inline immediately following the bytecode associated with the instruction.

To see the difference, consider the following low-level instructions:

const 3       // compute 3+4, leave on the stack
const 4
add

Most of the time we look at that as the assembly language version of bytecodes, but I do not have to assemble that down to bytecodes to execute it. I could instead build a series of objects and add them to a list:

To execute the code block, all we have to do is loop across the instruction list calling, say, exec() on each instruction. Rather than switch on the opcode, we use polymorphism to call the exec() method associated with each instruction. It's often much easier to understand for object-oriented programmers than a low-level byte code implementation (which is much closer to a hardware CPU simulation).

To make things really simple, let's assume that the input to the interpreter is correct not only syntactically but semantically. That means you can avoid implementing error-checking. Just let the interpreter throw an exception if something goes wrong.

My complete implementation is only about 900 lines of code, including the grammar that parses VML. Ok, let's take a look at VML in more detail.

VML

The goal of this interpreter is to teach you about how interpreters work. VML is a very high-level interpreter and totally ignores efficiency.

VML stack-based just like Java's and C#'s bytecode interpreters. In other words, we'll perform all operations on the stack rather than in registers. The code above pushes two integer constants on the stack and then executes an add. The add pulls two operands off the stack and leaves their summation on the stack.

VML has 4 predefined types: integers, floats, strings, and lists. We'll use the predefined Java types to represent them.

The primary elements of the VM language are classes and methods. As we read them in, we'll create ClassSymbol and VariableSymbol objects and keep track of them just as we do in a high-level language when building a symbol table. We'll associate code blocks with those symbols as well. We won't be creating a scope tree because that is something that the compiler sitting on top of VML would do. All we need to be able to do is find the code for a method when an instruction tries to call it.

The analog of a compile time scope is the runtime execution context. An execution context is essentially a dictionary that maps names to values. A context stores both method and class definitions as well as variable name/value pairs. Method and class definitions are evident from the VML source (def and class instructions) but variables appear when we execute a store operation. For example, the following code stores 1 in global variable x and prints it out:

setx.vml

Execution contexts

Contexts nest arbitrarily even without class or function definitions. In order to access elements within different contexts, all instructions that reference symbols, such as load and store, support the context offset argument. Offset 0 means the current context, -1 means the enclosing context, -2 means the enclosing context's enclosing context, etc... When the offset is 0, you can omit it. Because the context stack can be arbitrarily big, we need a way to access the outermost context. This provides a global context capability. We use keyword global for this. The following code snippet prints the value of global x three times:

nested-ctx.vml

The output looks like:

$ java VM nested-ctx.vml
1
1
1
$

We don't explicitly define variables. We define them dynamically the way Python does. Program setx.vml is exactly equivalent to the following Python code:

Code blocks, lambdas, and closures

Before looking at functions, there are a few more things we need to know about contexts and code blocks. Code blocks execute immediately unless they are arguments to a VML instruction (we call this deferred mode). Immediate-mode code blocks execute as soon as we see them, possibly storing values in the context dictionary. Regardless, the result of executing a block is an execution context on the operand stack. From there, we can print it just like any other element:

$ java VM
{ const 34 store x }  // push a context with x set to 34
print		      // print out the context
^D                    // hit eof to close stdin
{x=34}
$

In this case, the context has a single element in the dictionary.

Code blocks are first-class objects in VML. You can push them onto the stack as an operand to a call instruction. To prevent a code block from being executed immediately, use the lambda instruction to push it on the stack:

$ java VM
lambda { const "foo" print } // push closure on the stack
{} // arg context
call                         // invoke the closure
^D
foo
$

That also means we can pass code blocks around and execute them whenever we want. The following code snippet passes a lambda code block from function f to function g. Lambda code blocks always retain their original execution context. Therefore we can call them closures. In this case, f defines a local variable x=3 that the closure references. f passes it as an argument, clo, to g which executes it. Even though there is an x in the global scope and in the local scope of g, the closure passed from f executes in the context of f. It sees f's local.

closure.vml

Executing that code gives the following output.

3
99

Functions and function calls

Functions in VML begin with the def instruction. For example, here is a function definition that prints hello:

To call a function, we pass arguments in an execution context (a dictionary):

We never define parameters--it's up to the invoking code and the code within the function itself to decide what values to pass and access. Here's an example that passes a single argument x to f, which prints it:

To return a value, use the retv instruction:

Functions are also first-class objects. You can pass functions around like variables:

Class definitions

Define classes in VML with the class instruction:

To call a method on object o, we first push o then push an argument context and execute a mcall instruction, as you can see. The mcall instruction automatically adds parameter self to the argument context. The value of self is o.

To make an instance of a class, we use new. To invoke the constructor, we have to manually call a method. I have been using _init_ as a convention:

init.vml

That prints <Empl:{name=Ter}> as output.

Specify inheritance relationships using the class instruction. VML copies all of the methods from the superclass into the derived class and then defines methods physically defined within the derived class, thereby overriding methods for the same name. There is no method overloading (same name, different argument lists) to worry about as VML doesn't support it.

inherit.vml

Running that through VML yields:

Polymorphism (late binding) "just works" because mcall always looks up methods starting in self's class definition.

Instructions

Here of the basic expression and statement instructions (note that s denotes the top of the stack and s1 denotes one below the top of the stack--both are popped off during instruction execution):

Instruction Description
noop                            does nothing.
halt force VM to halt by jumping although it back to the VM.start() method.
pop throw away the top of the stack; mainly used to throw away unused function return values.
dup duplicate the top of the stack
const push a constant value onto the stack (integers, floats, strings)
eq push s1.equals(s)
is push s1==s
add push s1+s, promoting char and int operands to float if either argument is of type float
sub push s1-s
mul push s1*s
div push s1/s
lt push Boolean s1<s
le push Boolean s1<=s
gt push Boolean s1>s
ge push Boolean s1>=s
or push Boolean s1 || s
and push Boolean s1 || s
true push Boolean constant true onto stack
false push Boolean constant false onto the stack
not push Boolean !s
load ctx,id push value of a local or var or global identified by id in context ctx below the top of the stack.  ctx=0 indicates the current context. ctx=global indicates the outermost context. If omitted, assume ctx=0.
store ctx,id map id to s in context ctx below the top of the stack. Same ctx rules apply as as for load instruction.
list n pop n elements from the top of the stack, put them in a list, and push the list.  Top of the stack is the last element in the list.
index push s1[s]
listadd s1.add(s)
listdel s1.remove(s)
size push s.size()
print print the top of the stack (call toString() on s and println).
ifelse block1 block2 If top of stack is true, execute block1 else execute block2.
whiletrue cond block While execution of cond block evaluates to true, execute block. cond block leaves true or false on the stack after execution.

Here are the instructions related to executing code blocks and functions:

Instruction Description
def id() block                            define a method
{...} execute the code block, leaving the context on the stack
call ctx,id call function identified by id in context ctx below the top of the stack. ctx=0 indicates the current context. ctx=global indicates the outermost context. If omitted, assume ctx=0. The top of the stack must be an argument context. If call instruction has no arguments, then assume s1 is a closure or MethodSymbol.
ret return from a code block
retv return a value from a code block
lambda block Push block onto the stack as a code block operand; don't evaluate the block.

Here are the instructions related to class definitions and methods.

Instruction Description
class id block                 Define a class. Execute the code block to execute the def instructions.
class id : id2 block Define a class derive from id2.
new id push a new instance of class id.
fload id push s.id
fstore store s at s1.id
mcall id call method s1.id with s as the argument context.
supercall id the same as mcall except we look up methods in the surrounding class' superclass.

VML is very similar to postscript with its execution contexts and so on. The big difference is that postscript instructions don't have operands; everything is on the stack whereas VML instructions have arguments like iconst 99.

Examples

The examples you must get working are in examples.tar.

Here is the correct output, which I generate with a little bash FOR loop:

$ for f in *.vml grad/*.vml; do echo "--" $f "--" ; java VM $f; done
-- anon.vml --
{x=34}
-- cl.vml --
1
halted
-- closure.vml --
3
99
-- func.vml --
f()
-- if.vml --
99
2 !eq 3
halted
-- inherit.vml --
<Empl:{name=Ter}>
anObject
<Coder:{name=Ter}>
Ter
-- init.vml --
<Empl:{name=Ter}>
-- list.vml --
[foo, bar, 34]
bar
[foo, bar, 34, blort]
[bar, 34, blort]
3
-- nested-ctx.vml --
1
1
1
-- nested-func.vml --
3
halted
-- s.vml --
foo
12.4
halted
-- sc.vml --
99
<A:{}>
2
halted
-- setx.vml --
1
-- t.vml --
1
1
halted
-- var.vml --
1
-- while.vml --
0
1
2
3
4
5
6
7
8
9
halted
-- grad/closure.vml --
3
99
-- grad/lambda.vml --
in lambda
32
in f()
$

Grad students have to get the closure and lambda stuff working. They will need the output from these two files as well:

Implementation

Support code

Because you know how to build parsers now, I'm providing VML.g for you. You should familiarize yourself with the grammar though because it tells you what instructions you need to build. Also note how simple it is. VML code can look very complicated, but VML is very regular and is really just a list of instructions and nested blocks.

VML.g

Here is an outline for your primary VM class:

VM.java

What you must build

Here are the classes I created for this project:

VM. This class is the main class of your interpreter and contains the main() method as you can see above. Your interpreter needs a context stack, an operand stack, and a pointer to the main code block. The main code block is executed when you call start() on the VM. The VM init method creates the various stacks and pushes a global context. It also defines the Object class in that global context. For fun, it defines a toString method within Object by executing the def instruction inside const string VM.Object_methods. The fetch execute cycle of the interpreter lives inside method exec(CodeBlock, contextOfOrigin). We only need the contextOfOrigin parameter if we are handling lambdas and closures. The idea is to walk through the instruction list one by one, calling Instr.exec() on each instruction. We terminate the execution of a code block upon seeing an instruction that answers true for I.exitsBlock. Upon retv instruction, the block returns a value as well as stops executing the block.

Symbol. A simplified version of our usual object. It's only property is name.

MethodSymbol. A method symbol is a symbol that has an associated code block.

ClassSymbol. A class symbol is a symbol that has a code block, a list of method symbols, and a superclass pointer. It also has a define and get method to define method symbols and look them up. The get method does not look up the superclass chain. The class instruction copies all of the method symbols from a superclass into the derived class' method list--there is no need to waste time looking up the chain. We look up global methods and variables directly using the global context identifier. Presumably the compiler that generates code will do all of the symbol resolution and identify which methods are global etc. The interpreter is capable of accessing any scope. So, it is up to the compiler and semantics of the language what scope/context to use.

CodeBlock. A code block is just a list of instructions; for convenience I have an add method.

ExecContext. Execution context is the key element of the interpreter. It's a set of name/value pairs that records all variables (globals, locals, and parameters), class definitions, and method definitions. The Instance subclass even represents an instance of a VML class.  I have define and get methods for convenience. Make sure that the values are ordered in case people care about the order of parameters; use a LinkedHashMap.

Instance. An Instance is an execution context (set of name/value pairs) but also has a pointer to a class definition. That means, that the names in the context are the fields of an object.

Closure. A closure is a code block that can return a value (it's a lambda basically) and knows the context in which it was created. I use delegation rather than inheritance to store the code block associated with the closure. My fields are:

Instr. An Instr represents an instruction.  It tracks the line number at which the instruction instance was found (for error-checking) and my implementation tracks the surrounding code block (see instr_list rule in VML.g) in case that ever comes in handy. The individual instructions themselves store the instruction arguments as fields. For example, here is my boolean or operator:

I put all of my instruction class definitions inside Instr.java just to keep them together. The file gets a little bit big (422 lines for me), but that's okay. It's worth it for the encapsulation.

HaltError. If we ever encounter a halt instruction in some deeply nested function call, we have to bail out of the entire interpretation process. Since we use the Java stack to do VML method calls (via VM.exec()), we need to use an exception to blast out all the way back to the start() method. I use an Error not an Exception to avoid having to specify the throws clause on all of the instruction exec methods. This class has no fields. It just subclasses Error.

Submission

You will create a jar file called interp.jar containing grammars, source, and *.class files and place in your lib directory:

https://www/svn/userid/cs652/proj5/trunk/lib

You can use the svn account for development of the software to if you would like, but I will only be looking at your sym.jar file in the lib directory.

Please bring a printout of VML.g and Instr.java as well as the output generated from all examples above (grad students will have two more examples to get working--the lambda stuff in the grad directory of the examples.tar file).

Extra credit for undergrads: if you do the lambda stuff like grad students you get extra credit of 5%.

Grading

Make sure your jar has the VM class in the default package with a main method so that I can test your code.  Make it read from either standard input or from a file (which would be the sole argument on the command line).

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Feb 11, 2009

    Anonymous

    Can you attach a solution for this (that is, unless you are still using this project for classwork). This would be a very good java-integration intro for us newbies.

  2. Feb 12, 2009

    I can't provide solution until after students finish project around April 2009. Soon!
    Explanation in detail will be in upcoming book.

  3. Jun 14, 2009

    Anonymous

    I'd really like to see more of this.. I have an interpreter running within a microcontroller along with a software simulator. I am reviewing the code (porting to JAVA), the VML looks like a good place to start looking. It would be very handy to have the remaining class files since I am at the review stage and don't have time to develop everything right now. Can the files be obtained in confidence if they are still part of the university, are they available in the book you mention - maybe on a CD or similar?

    Thanks..

  4. Jun 15, 2009

    hi. next beta coming out of the language design patterns book at pragprog.com will have the first interpreter chapter with source.