Register-Based Virtual Machine for TinyPie

by Ruslan Spivak on February 8, 2011

After finishing up initial work for the TinyPie interpreter and AST visualizer I’m well on my way to TinyPie compiler.

For simplicity reasons and to better understand different parts involved I decided to go with Register-Based Bytecode Interpreter / Virtual Machine , like the one used by Lua and Dalvik, instead of directly generating binary code for target architecture. The main source of inspiration, ideas, and examples were the book Language Implementation Patterns Ch.10(Building Bytecode Interpreters) and the paper The Implementation of Lua 5.0

Here is a quick overview of developed components:

  1. Bytecode Assembler
  2. Register-Based Virtual Machine

Bytecode Assembler converts assembly program into binary bytecodes. The bytecode is further interpreted by the TinyPie Register-Based Bytecode Interpreter / Virtual Machine.

TinyPie Assembly language grammar:

Assembler yields the following components:

  1. Code memory: This is a bytearray containing bytecode instructions and their operands derived from the assembly source code.
  2. Global data memory size: The number of slots allocated in global memory for use with GSTORE and GLOAD assembly commands.
  3. Program entry point: An address of main function .def main: ...
  4. Constant pool: A list of objects (integers, strings, function symbols) that are not part of the code memory. Bytecode instructions refer to those objects via integer index.

Here is a factorial function in TinyPie language:

Here is an equivalent TinyPie assembly code:

Here are the resulting elements produced by the Bytecode Assembler after translating the above assembly code:

Virtual Machine architecture(Virtual Machine simulates a hardware processor with general-purpose registers.):

  1. IP: Instruction pointer register that points into the code memory at the next instruction to execute.
  2. CPU: Instruction dispatcher that simulates fetch-decode-execute cycle with a switch (if elif) statement in a loop – reads bytecode at IP, decodes its operands and executes corresponding operation.
  3. Global data memory: a list of global objects. Contents is accessed via integer index.
  4. Code memory: Holds bytecode instructions and their operands.
  5. Call stack: Holds StackFrame objects with function return address, parameters, and local variables.
  6. Stack frame: A StackFrame object that holds all required information to invoke a function:
    • function symbol
    • function return address
    • registers hold return value, arguments, locals, and temporary values
  7. FP: Frame pointer – a special-purpose register that points to the top of the function call stack
  8. Constant pool: Integers, strings, and function symbols all go into constant pool. Instructions refer to constant pool values via an integer index.

High-level overview:

Bytecode instructions for TinyPie VM:

TinyPie VM comes with a tpvm command line utility:

Sample program:

Running VM with the sample program as an input:

The missing piece is conversion from TinyPie AST to TinyPie Assembly code that I need to implement for the TinyPie compiler.

If you enjoyed this post why not subscribe via email or my RSS feed and get the latest updates immediately. You can also follow me on GitHub or Twitter.

{ 4 comments… read them below or add one }

Pixy Misa February 10, 2011 at 1:16 AM

This looks really interesting and quite elegant. It would make a great starting point for anyone interesting in building their own language.

You’ve sold me on that book too – I’m off to buy an eBook copy now!


Ruslan Spivak February 10, 2011 at 9:50 AM

I’m glad you liked it.
The book is really good, it gives an excellent starting point independent of implementation language you use even though all examples are written in Java and ANTLR.


m35 April 14, 2011 at 11:12 PM

After years of thinking about it, I’m finally getting around to my own language writing experience as well. It’s helpful to see another’s implementation, and coincidentally, mine is looking very similar: high level language -> parser/compiler -> vm assembly -> byte-code -> native assembly/machine code. I was using the LLVM instruction set as a starting point for what VM ASM instructions to include. The instruction set listed here for your VM seems rather sparse to me (missing lte, shr, and, or, etc.). How does that affect your VM?


Ruslan Spivak April 15, 2011 at 1:00 AM

First of all congrats on deciding to implement your own language.
As for instruction set – you’re right, it’s very basic and sparse. I just wanted to implement a full stack as quickly as possible, completeness and efficiency was’t the goal.
Once I finish playing with a basic compiler to native code I want to get back to VM, extend instruction set and then try to implement VM in C.


Speak your mind

Previous post:

Next post: