Dashboard > USF Computer Science 652 - Programming Languages > CS652 Home > Byte code assembler and interpreter
  USF Computer Science 652 - Programming Languages Log In | Sign Up   View a printable version of the current page.  
  Byte code assembler and interpreter
Added by Terence Parr, last edited by Terence Parr on Mar 29, 2008  (view change)
Labels: 
(None)

This project gives you experience building an assembler and interpreter for a simple stack machine. Your goal is to get the three sample programs to execute properly through your interpeter.

; data declarations go first
	.decl x
;
; then instructions
;
main:
	const 20
	store x
	load x
	print
	halt
int x;
main() {
  x = 20;
  print x;
}
$ java Interp < test.asm

Disassembly:
        CONST 20
        STORE 0
        LOAD 0
        PRINT
        HALT

Output:
20

Data memory (offset 0):
00000000:  00 00 00 14

Code memory (offset 4):
00000004:  0E 00 00 00 14 11 00 00
0000000C:  00 00 0F 00 00 00 00 14
00000014:  15

high level code

method main() {
  int i;
  x = 0;
  while ( i<=10 ) {
    print i;
    i = i + 1;
}

would be translated as follows:

; test looping
main:
; int i
        lalloc 1
; x = 1
        const 1
        fpstore -1
; while ( i<=10 ) {
beginloop:
        fpload -1
        const 10
        gt
        brt endloop
;   print i
        fpload -1
        print
;   i = i + 1
        fpload -1
        const 1
        add
        fpstore -1
; }
        br beginloop
endloop:
        halt

output

Disassembly:
        LALLOC 1
        CONST 1
        FPSTORE -1
        FPLOAD -1
        CONST 10
        GT
        BRT 58
        FPLOAD -1
        PRINT
        FPLOAD -1
        CONST 1
        ADD
        FPSTORE -1
        BR 15
        HALT

Output:
1
2
3
4
5
6
7
8
9
10

Data memory (offset 0):


Code memory (offset 0):
00000000:  13 00 00 00 01 0E 00 00
00000008:  00 01 12 FF FF FF FF 10
00000010:  FF FF FF FF 0E 00 00 00
00000018:  0A 06 0D 00 00 00 3A 10
00000020:  FF FF FF FF 14 10 FF FF
00000028:  FF FF 0E 00 00 00 01 01
00000030:  12 FF FF FF FF 0C 00 00
00000038:  00 0F 15

assembly code

.decl a
	.decl x

main:
; x = foo(20)
	const 20
	call foo
	store x
; print x
	load x
	print
	halt

foo:
; int foo(int i) {
;   return i+1
	fpload 2
	const 1
	add
	retv 1

output

$ java Interp < call.asm 

Disassembly:
        CONST 20
        CALL 22
        STORE 4
        LOAD 4
        PRINT
        HALT
        FPLOAD 2
        CONST 1
        ADD
        RETV 1

Output:
21

Data memory (offset 0):
00000000:  00 00 00 00 00 00 00 15

Code memory (offset 8):
00000000:  0E 00 00 00 14 09 00 00
00000008:  00 16 11 00 00 00 04 0F
00000010:  00 00 00 04 14 15 10 00
00000018:  00 00 02 0E 00 00 00 01
00000020:  01 0B 00 00 00 01

As you can see, the Interp class main program invokes the assembler first to break the assembly code down into bytecodes and then launches the interpreter. I how provided the shell or the complete class definition for everything you need.

Assembler

Here is the outline of the grammar:

/** A simple-extendable assembler.  Allows integer data declarations
 *  code labels and label operands (for which it computes an address).
 *  Allows forward-references of code labels, but data declarations must
 *  all appear before the code.  You pass in a set of instruction names
 *  via the construction, which defines the assembly code.  Subclass
 *  this parser and override the functionality methods such as gen()
 *  to make this parser actually do something.
 */
grammar Assembler;

@members {
    // Define the functionality required by the parser for code generation
    protected void gen(Token instrToken) {;}
    protected void gen(Token instrToken, int operand) {;}
    protected void checkForUnresolvedReferences() {;}
    protected void declareData(Token idToken) {;}
    protected int getOperandAddress(Token idToken) { return 0;}
    protected void defineLabel(Token idToken) {;}
}

program : ... ;
...

ID  :   ('.')? ('a'..'z'|'A'..'Z')+ ;  // ID and ".decl"

INT :   ('-')? ('0'..'9')+ ;

WS  :   (' '|'\t')+ {skip();}
    ;

NEWLINE
    :   {getCharPositionInLine()>=1}?=> '\r'? '\n'
    ;

NEWLINE_BY_ITSELF
    :   {getCharPositionInLine()==0}?=> '\r'? '\n' {skip();}
    ;

COMMENT
    :   ';' .* '\n' {skip();}
    ;

You need to fill in the rules and then add some simple actions that make calls to those methods described above as shown in the following rule:

data:   '.decl' ID NEWLINE {declareData($ID);}
    ;

You will also need a very simple symbol table (compared to your last project). Here are the various classes you need:

public class CodeSymbol extends Symbol {
    public CodeSymbol(String name, int address, boolean forward) {
        super(name);
        forwardRef = forward;
        if ( forward ) {
            // if forward reference, then address is address to update
            addForwardReference(address);
        }
        else {
            this.address = address;
        }
    }
}
public class DataSymbol extends Symbol {
    public DataSymbol(String name, int address) {
        super(name);
        this.address = address;
    }
}
import java.util.*;

public class Symbol {
    String name;

    /** Address in data / instruction memory */
    int address;

    /** Is this ref'd before def'd. */
    boolean forwardRef = false;

    boolean defined = true;

    /** List of operands in memory we need to update after seeing def */
    Vector forwardReferences = null;

    public Symbol(String name) {
        this.name = name;
    }

    public int getAddress() {
        return address;
    }

    public void setAddress(int address) {
        this.address = address;
    }

    public boolean isForwardReference() {
        return forwardRef;
    }

    public void setIsForwardReference(boolean forwardRef) {
        this.forwardRef = forwardRef;
    }

    public void addForwardReference(int address) {
        if ( forwardReferences==null ) {
            forwardReferences = new Vector();
        }
        forwardReferences.addElement(new Integer(address));
    }

    public Vector getForwardReferences() {
        return forwardReferences;
    }

    public boolean isDefined() {
        return defined;
    }

    public void setIsDefined(boolean defined) {
        this.defined = defined;
    }

    public void resolveForwardReferences(byte[] code) {
        setIsForwardReference(false);
        // need to patch up all references to this symbol
        Vector opndsToPatch = getForwardReferences();
        for (int i=0; i<opndsToPatch.size(); i++) {
            Integer addrI = (Integer)opndsToPatch.elementAt(i);
            int addr = addrI.intValue();
            /*
            System.out.println("updating operand at addr "+
                    addr+" to be "+getAddress());
            */
            Interp.writeInt(code, addr, getAddress());
        }
    }

    public String toString() {
        String refs = "";
        if ( forwardReferences!=null ) {
            refs = "[refs="+forwardReferences.toString()+"]";
        }
        return name+"@"+address+refs;
    }
}

The Assembler subclass of the parser, fills in the abstract methods defined in a grammar:

import org.antlr.runtime.*;
import java.util.Hashtable;
import java.util.Enumeration;

/** Subclass the AssemblerParser to actually implement the necessary
 *  symbol table management and code generation routines.
 */
public class Assembler extends AssemblerParser {
    public static final int MAX_CODE_SIZE = 1024; // bogus, but ok for teaching
    protected Hashtable instructionOpcodeMapping = new Hashtable();
    protected Hashtable symbols = new Hashtable();

    /** Data address pointer */
    protected int dp = 0;

    /** Instruction address pointer */
    protected int ip = 0;

    protected byte[] code = new byte[MAX_CODE_SIZE];

    /** Create an assembler attached to a lexer and define
     *  the instruction set.
     */
    public Assembler(TokenStream lexer, String[] instructions) {
        super(lexer);
        for (int i=1; i<instructions.length; i++) {
            instructionOpcodeMapping
                    .put(instructions[i].toLowerCase(),new Integer(i));
        }
    }

    /** copy out the instruction memory */
    public byte[] getMachineCode() {
    }

    /** How much memory was allocated using .decl? */
    public int getDataMemorySize() {
    }

    /** How much memory was allocated in gen() routine? */
    public int getCodeMemorySize() {
    }

    /** Return the address associated with label "main:" if defined */
    public int getMainInstructionAddress() {
    }

    /** Generate code for an instruction with no operands */
    protected void gen(Token instrToken) {
    // lookup instruction name in the instruction to Opcode mapping
    // if not found, print an error
    // else get the Opcode value, mask it with 0xFF to strip the upper bits
    // in case of sign extension or something and store in the code
    // memory
    }

    /** Generate code for an instruction with one operand */
    protected void gen(Token instrToken, int operand) {
    // generate the opcode
    // write the integer opcode using Interp.writeInt
    // after the opcode.
    }

    /** After parser is complete, look for unresolved labels */
    protected void checkForUnresolvedReferences() {
    // walk the list of symbols looking for
    // !sym.isDefined(); give an error if you find any
    }

    protected void declareData(Token idToken) {
    // create a new DataSymbol
    // store in the symbol table
    }

    /** Compute the data or code address of an operand */
    protected int getOperandAddress(Token idToken) {
    // lookup the operand in symbols
    // if not their, assume it is a forward reference;
    // create a new CodeSymbol indicating that it is undefined;
    // store in the symbol table.
    // if defined and forward reference, track forward reference
    // via addForwardReference.
    // if defined but not Ford reference, simply use the address
    }

    protected void defineLabel(Token idToken) {
    // add a new CodeSymbol if not defined
    // if defined and forward reference, update the label with address
    // and resolve forward references.
    // if defined and not forward reference, print an error about
    // redefined symbol.
    }
}

Interpreter

import java.io.InputStream;
import org.antlr.runtime.*;

/** A simple interpreter for integer arithmetic */
public class Interp {
    public static final int DEFAULT_STACK_SIZE = 100;

    // boolean results stored on stack too
    // these constants described what int values represent them
    public static final int BOOLEAN_FALSE = -1;
    public static final int BOOLEAN_TRUE = 1;

    // INSTRUCTION BYTE CODES
    public static final int INSTR_ADD = 1;
    public static final int INSTR_SUB = 2;
    public static final int INSTR_MULT = 3;
    public static final int INSTR_DIV = 4;
    public static final int INSTR_LT = 5;
    public static final int INSTR_GT = 6;
    public static final int INSTR_EQ = 7;
    public static final int INSTR_NOT = 8;
    public static final int INSTR_CALL = 9;
    public static final int INSTR_RET = 10;    // return without value
    public static final int INSTR_RETV = 11;   // return with value
    public static final int INSTR_BR = 12;     // branch
    public static final int INSTR_BRT = 13;    // branch if true
    public static final int INSTR_CONST = 14;  // push constant integer
    public static final int INSTR_LOAD = 15;   // load from global memory
    public static final int INSTR_FPLOAD = 16; // load relative to FP
    public static final int INSTR_STORE = 17;  // store in global memory
    public static final int INSTR_FPSTORE = 18;// store relative to FP
    public static final int INSTR_LALLOC = 19; // local allocate
    public static final int INSTR_PRINT =20;   // print stack top
    public static final int INSTR_HALT = 21;   // print stack top

    /** List of instructions so assembler knows what to allow.
     *  Also used in disassembly.
     */
    public static String[] instructions = new String[] {
        "<INVALID>",
        "ADD", // index is the opcode
        "SUB",
        "MULT",
        "DIV",
        "LT",
        "GT",
        "EQ",
        "NOT",
        "CALL",
        "RET",
        "RETV",
        "BR",
        "BRT",
        "CONST",
        "LOAD",
        "FPLOAD",
        "STORE",
        "FPSTORE",
        "LALLOC",
        "PRINT",
        "HALT",
    };

    /** Used for disassembly; says if opcode has an operand */
    public static int[] operands = new int[] {
        0, // <INVALID>
        0, // ADD
        0, // SUB
        0, // MULT
        0, // DIV
        0, // LT
        0, // GT
        0, // EQ
        0, // NOT
        1, // CALL
        1, // RET
        1, // RETV
        1, // BR
        1, // BRT
        1, // CONST
        ?, // LOAD    FILL IN!!!!!!!!!!!
        ?, // FPLOAD
        ?, // STORE
        ?, // FPSTORE
        ?, // LALLOC
        0, // PRINT
        0  // HALT
    };

    /** byte-addressable memory stores code and data.  Opcodes
     *  are 8 bits and operands are 32-bits; must recombine
     *  4 bytes into 32-bit word.  Operands are either integers or
     *  addresses (or saved registers).
     *
     *     ----------------
     *  0: |  data memory |
     *     |              |
     *     |              |
     *     ----------------
     *  cp:|  code memory |    begins right after data
     *     |              |
     *     ----------------
     *
     *  The cp allows us to relocate code in the memory (w/o having to
     *  change the code--all code memory fetches are done relative to
     *  cp register).
     */
    byte[] memory;

    /** word-addressable stack (grows downward); stores regs, ints,
     *  addresses, booleans (boolean operator results).
     *
     *  Here is a stack activation record for a function/method:
     *
     *     |     ...      |
     *     ----------------
     *     |     arg 1    |
     *     |     arg 2    |
     *     |     ...      |
     *     |     arg n    |
     *     |   ret addr   |
     *  fp:|    old fp    |
     *     |   local 1    |
     *     |   local 2    |
     *     |     ...      |
     *     |   local m    |
     *     ----------------
     *     |  work space  |
     *     |     ...      |
     *
     *  So arg i is at stack[fp+2+(n-i)] since we push args in
     *  specified order rather than reverse.  Local i is at
     *  stack[fp-i].  The compiler is required to compute an
     *  integer constant such as 2 that accesses the first argument
     *  or -1 that accesses the first local.
     */
    int[] stack;

    /** Stack pointer */
    int sp;

    /** Frame pointer for function call activation records */
    int fp;

    /** Instruction pointer register */
    int ip;

    /** For relocatable code, set "code pointer" */
    int cp;

    int dataMemorySize = 0;

    boolean trace = false;

    public static void main(String[] args) throws Exception {
        // PROCESS ARGS
        boolean trace = false;
        if ( args.length>0 ) {
            if ( args[0].equals("-trace") ) {
                trace = true;
            }
        }

        // ASSEMBLER
        AssemblerLexer assemblerLexer = new AssemblerLexer(new ANTLRInputStream(System.in));
        CommonTokenStream tokens = new CommonTokenStream(assemblerLexer);
        Assembler assembler = new Assembler(tokens, instructions);
        assembler.program();
        byte[] program = assembler.getMachineCode();
        int dataSize = assembler.getDataMemorySize();
        int startip = assembler.getMainInstructionAddress();

        // DISASSEMBLE
        disassemble(program);

        // EXEC
        System.out.println("\nOutput:");
        Interp interpreter = new Interp();
        interpreter.trace = trace;
        interpreter.exec(program, startip, dataSize);

        // DUMP
        interpreter.coredump();
    }

    /** Execute the bytecodes in "code" starting at ip; alloc
     *  enough static memory for dataMemorySize bytes.  You cannot
     *  define data memory before execution--must set with code.
     */
    public void exec(byte[] code, int ip, int dataMemorySize)
            throws Exception
    {
        // call loader to fill in code and allocate global data space
        // set up the stack environment; sp is at max stack value
        // SIMULATE "call main"; manually set up stack as if we called main()
        // invoke cpu to start execution at main
    }

    /** Load program into memory, set up data memory, set ip etc... */
    protected void loader(byte[] code, int ip, int dataMemorySize) {
        this.dataMemorySize = dataMemorySize;
        // make enough room for code + data
        memory = new byte[code.length+dataMemorySize];
        // load code just above data memory
        cp = dataMemorySize; // code pointer base right after data memory
        for (int i=0; i<code.length; i++) {
            memory[cp+i] = code[i];
        }
        this.ip = ip; // ip is relative to cp when executing
    }

    /** Simulate the fetch-execute cycle */
    protected void cpu() throws Exception {
        byte opcode = memory[cp+ip];
        int v = 0;
        int addr = 0;
    int left,right;
        while (opcode!=INSTR_HALT) {
            if ( trace ) {
                System.out.print(hex(cp+ip)+": ");
                disassembleInstruction(memory, cp+ip);
            }
            ip++; //jump to next instruction or first byte of operand
            switch (opcode) {
                // fill cases for instructions
                ...
                default :
                    throw new Exception("invalid opcode: "+opcode+" at ip="+(ip-1));
            }
            opcode = memory[cp+ip];
        }
    }

    public void coredump() {
        System.out.println("\nData memory (offset 0):");
        dump(memory, 0, dataMemorySize);
        System.out.println("\nCode memory (offset "+cp+"):");
        dump(memory, cp, memory.length-dataMemorySize);
    }

    public static void disassemble(byte[] memory) {
        System.out.println("\nDisassembly:");
        for (int i=0; i<memory.length; ) {
            i = disassembleInstruction(memory, i);
        }
    }

    public static int disassembleInstruction(byte[] memory, int ip) {
        int opcode = memory[ip];
        ip++;
        String instrName = instructions[opcode];
        System.out.print("\t"+instrName);
        if ( operands[opcode]==1 ) {
            int opnd = getInt(memory, ip);
            ip += 4;
            System.out.print(" "+opnd);
        }
        System.out.println();
        return ip;
    }

    /** Pull off 4 bytes starting at ip and return 32-bit signed int value.
     *  Return with ip pointing *after* last byte of operand.  The byte-order
     *  is high byte down to low byte, left to right.
     */
    private int getIntOperand() {
    }

    // UTILITY ROUTINES

    public static int getInt(byte[] memory, int index) {
        int b1 = memory[index++]&0xFF; // mask off sign-extended bits
        int b2 = memory[index++]&0xFF;
        int b3 = memory[index++]&0xFF;
        int b4 = memory[index++]&0xFF;
        int word = b1<<(8*3) | b2<<(8*2) | b3<<(8*1) | b4;
        return word;
    }

    /** Write value at index into a byte array highest to lowest byte,
     *  left to right.
     */
    public static void writeInt(byte[] bytes, int index, int value)
    {
        bytes[index+0] = (byte)((value>>(8*3))&0xFF); // get highest byte
        bytes[index+1] = (byte)((value>>(8*2))&0xFF);
        bytes[index+2] = (byte)((value>>(8*1))&0xFF);
        bytes[index+3] = (byte)(value&0xFF);
    }

    public static void dump(byte[] memory, int offset, int n) {
        for (int i=0; memory!=null && i<n; i++) {
            if ( i%8==0 && i!=0 ) {
                System.out.println();
            }
            if ( i%8==0 ) {
                System.out.print(hex(offset+i)+": ");
            }
            System.out.print(" "+hex(memory[i+offset]));
        }
        System.out.println();
    }

    public static String hex(byte b) {
        String bs = Integer.toString(b&0xFF,16).toUpperCase();
        if ( (b&0xFF)<=(byte)0x0F ) {
            bs = '0'+bs;
        }
        return bs;
    }

    public static String hex(int b) {
        String bs = Integer.toString(b,16).toUpperCase();
        int n = bs.length();
        for (int i=1; i<=(8-n); i++) {
            bs = '0'+bs;
        }
        return bs;
    }
}

Submission

You will create a jar file called asm.jar containing grammars, source, and *.class files and place in your lib directory:

https://www/svn/userid/cs652/proj4/trunk/lib

You can use the svn account for development of the software to if you would like, but I will only be looking at your asm.jar file in the lib directory.

Grading

Make sure your jar has the Interp and other classes in the default package with a main method so that I can test your code. Your program must be able to run the examples given.

You will be graded on the actions within the grammar and Assembler.java subclass as well as the instruction implementations in the interpreter. I will test your program on valid and invalid input.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.1 Build:#806 May 06, 2007) - Bug/feature request - Contact Administrators