For simplicity reasons and to better understand different parts involved I decided to go with Register-Based Bytecode Interpreter / Virtual Machine , like the one used by Lua and Dalvik, instead of directly generating binary code for target architecture. The main source of inspiration, ideas, and examples were the book Language Implementation Patterns Ch.10(Building Bytecode Interpreters) and the paper The Implementation of Lua 5.0
Here is a quick overview of developed components:
- Bytecode Assembler
- Register-Based Virtual Machine
Bytecode Assembler converts assembly program into binary bytecodes. The bytecode is further interpreted by the TinyPie Register-Based Bytecode Interpreter / Virtual Machine.
TinyPie Assembly language grammar:
Assembler yields the following components:
- Code memory: This is a
bytearraycontaining bytecode instructions and their operands derived from the assembly source code.
- Global data memory size: The number of slots allocated in global memory for use with GSTORE and GLOAD assembly commands.
- Program entry point: An address of main function
.def main: ...
- Constant pool: A list of objects (integers, strings, function symbols) that are not part of the code memory. Bytecode instructions refer to those objects via integer index.
Here is a factorial function in TinyPie language:
Here is an equivalent TinyPie assembly code:
Here are the resulting elements produced by the Bytecode Assembler after translating the above assembly code:
- IP: Instruction pointer register that points into the code memory at the next instruction to execute.
- CPU: Instruction dispatcher that simulates fetch-decode-execute cycle with a switch (if elif) statement in a loop – reads bytecode at IP, decodes its operands and executes corresponding operation.
- Global data memory: a list of global objects. Contents is accessed via integer index.
- Code memory: Holds bytecode instructions and their operands.
- Call stack: Holds StackFrame objects with function return address, parameters, and local variables.
- Stack frame: A StackFrame object that holds all required information to invoke a function:
- function symbol
- function return address
- registers hold return value, arguments, locals, and temporary values
- FP: Frame pointer – a special-purpose register that points to the top of the function
- Constant pool: Integers, strings, and function symbols all go into constant pool. Instructions refer to constant pool values via an integer index.
Bytecode instructions for TinyPie VM:
TinyPie VM comes with a
tpvm command line utility:
Running VM with the sample program as an input:
The missing piece is conversion from TinyPie AST to TinyPie Assembly code that I need to implement for the TinyPie compiler.