The second section of this course is about executing programs without translating the source code down to machine code. In other words we are going to talk about interpreters. An interpreter at its core reads a statement and then executes it, reads another statement and executes it, and so on. Languages like Java, C#, Python, and Ruby have interpreters or virtual machines that execute bytecode instructions. These instructions are a little bit higher level than raw machine instructions, but they're portable and easier to understand. The most significant difference between a virtual machine and a normal CPU is that virtual machines are stack machines whereas CPUs are register machines. A stack machine looks for operands on the stack and puts results back on the stack rather than to and from registers.
Compilers for interpreted languages translate source code down to bytecodes, which is then executed by the virtual machine. The typical stages of translation in the compiler are:
- parse the source language and build an AST (define classes, functions, etc... as you parse)
- for statically typed languages, walk the AST to perform semantic analysis
- walk the tree to generate bytecodes
Another way to execute the source program is to interpret the source code in its AST form rather than by code form. In this case, the compiler and interpreter sort of merge and you end up with this:
- parse the source language and build an AST (define classes, functions, etc... as you parse)
- walk the tree to execute instructions; perform semantic analysis (check type compatibility if statically typed, possibly some symbol definition work for languages w/o local variable declarations).
In many cases, there is a very close relationship between the subtree roots in the AST and bytecode instructions. The leaves of the subtrees are operands of the instructions. Consider the following simple assignment statement:
To execute that code with something higher level than a raw CPU instruction requires that an interpreter execute something like the following method call:
Bytecode interpreters and AST-based interpreters both trigger that method call after recognizing the statement. The only substantive difference lies in the instruction pattern recognition. First, let's take a look at bytecodes.
Bytecode interpreters
A compiler for an interpreted language would translate that source code to something like the following (Python) bytecodes:
If we are translating from a statically typed language like Java, we would encode type information in the instructions:
In either case, there is a fetch-execute cycle in the interpreter that switches according to the byte code:
switch ( memory[pc] ) {
case LOAD_CONST : ...
case STORE_FAST : ...
...
}
Tree-based interpreters
A pure interpreter, on the other hand, would interpret a AST subtree for the assignment that probably looks like the following.

When the tree walker finds the = node, it would call the assign method to actually execute it. That method would both perform semantic analysis and execute the assignment. For example, in a statically typed language, it would verify that x is defined and is of an appropriate type to receive an integer.
The guts of a tree-based interpreter, assuming we are using a tree grammar, would look like the following.
Instead of using a tree grammar, for many interpreters it's easier to use a set of classes, one for each instruction. Each class has an exec() method. For example, an Assign class might look like:
abstract class InstructionNode {
public void exec() {
}
class Assign extends InstructionNode {
public void exec() {
InstructionNode left = getChild(0);
String id = left.getText();
InstructionNode right = getChild(1);
Object value = right.exec();
... do assignment of value to id ...
}
}
To the uninitiated, this looks a lot more clear than the grammar version. The problem is that we should use formalisms whenever we can. That is why we use grammars to parse languages rather than writing parsers by hand. The same argument goes for walking trees. The code in Assign.exec is clearly a handbuilt version of a tree parser. ANTLR will generate code not unlike the handbuilt version. On the other hand, a handbuilt version would undoubtedly be more efficient.
Threaded interpreters
It is worth mentioning a final interpreter implementation strategy in passing called threaded. A threaded interpreter is almost identical to a byte code interpreter. The difference lies in that a threaded interpreter avoids the fetch-execute cycle by directly calling the implementation method:
So, instead of a switch in an interpreter run-time support code, a threaded interpreter directly executes the code.
Advantages
The speed of these interpreter implementation strategies is mostly dependent on how well they are implemented, but I would say that the threaded interpreter has the best chance of being the fastest. Unfortunately, a threaded interpreter requires the most code space. A bytecode interpreter has a series of 8-bit instructions whereas the threaded interpreter has a series of native method calls, which are much larger than eight bits.
A tree-based interpreter has the potential to be the slowest because it must do more work every time it loads code. Because it does not have bytecodes, it must reparse the source code and build a tree each time. For long running code this does not matter.
Another main disadvantage of threaded interpreters is that you have to recompile the method calls down to native machine code on each new target. You cannot just send a file and have execute right away on a machine (with and interpreter).
Comparing the ease of implementation of the tree-based and bytecode-based interpreters, bytecode interpreters are usually easier.
On-the-fly compilation
Interpreters for some languages like Java and C# use just in time or on-the-fly compilation to increase execution speed. Typically once some code has been executed a certain number of times, execution halts and a compiler kicks in to translate the byte codes to raw machine code. From then on, execution of that code fragment occurs at native speed. This strategy simply moves the final code generation phase of a static compiler to the interpretation phase. You can think of bytecodes as simply an executable form of internal compiler data structure.
"The most significant difference between a virtual machine and a normal CPU is that virtual machines are stack machines whereas CPUs are register machines."
The above is certainly not true in all cases. For example Lua has a register based virtual machine since version 5.0.