I have now implemented and tested the compiler front end for the Jack compiler. The Jack language is object-based, without support for inheritance. Its grammar is mostly LL(0), which means that in most cases, looking at the next token is enough to know which alternative to choose for so-called “non-terminals”.
My final implementation is a classical top-down recursive implementation, as proposed in chapter 10.
That is however after a re-factoring of a previous version, where I tried to apply the principle that an element should itself assess whether it is of a given type. All my
compileXxx() would return a boolean that indicates whether or not the current element is or not of type Xxx, with the side effect of generating the compiled code (XML for this chapter – real VM code in the next). The
compileXxx() functions are then predicates, which I found kind of neat. It felt like programming Prolog in C++. I had a version of the compilation engine built on that principle that passed all the tests (which are, for the purpose of this chapter, comparisons of XML-output).
However, I later on realized that the underlying principle is just wrong from an LL(0)-perspective. LL(0) says that when there is an alternative in a grammar rule, e.g.:
returnStatement 'return' expression?';'
which means that there may or may not be an expression in a return statement, the return statement level knows by a lookup of the next token whether or not there is an expression. This is the case here: there will be an expression if and only if the next token is not ‘;’.
With my predicate principle, the
compileExpression() would in itself have to decide whether the current element is an expression or not. This in fact happens to be much harder than checking whether or not the next token is a semicolon (an expression may occur in other contexts than “return”, so it cannot check on semicolon).
In other words, even if my code worked, I would not have been able to sleep at night if I had not done a re-factoring. It was actually quite easy, albeit time-consuming and boring.
With project 8 completed, I now have a Virtual Machine translator that takes any VM program as an input and outputs a corresponding Hack assembly file (see project 4) that can be run on the Hack CPU simulator. Since I had a few bugs, I ended up step through some code at the assembly level for the recursive Fibonacci example, which was an interesting exercise of concentration and patience.
The virtual machine in question is a single stack-stack machine, that provides support for function calling, including recursion.
After having implemented it, one feels quite at home reading section 2.1.1 Single vs. multiple stacks from the book Stack computers: the new wave by Philip Koopman (it is from 1989, so new is relative, but it is available online and it is one of the very few publications available about stack machine hardware).
Quoting the section:
An advantage of having a single stack is that it is easier for an operating system to manage only one block of variable sized memory per process. Machines built for structured programming languages often employ a single stack that combines subroutine parameters and the subroutine return address, often using some sort of frame pointer mechanism.
This “sort of frame pointer mechanism” is precisely what I have implemented in project 8. In our case, the stack machine is not built in hardware, it is implemented in the form of a translator to the machine language of a simple 16-bit register based CPU. It could however be directly built in hardware, as the many examples given in Stack computers: the new wave show. I suppose a very interesting project following this course would be to implement the VM specification of chapter 7 and chapter 8 in the HDL language in the same way as the Hack CPU was built in project 5. I am not sure how much the ALU would have to be modified to do that.
I will keep this project idea in the back of my mind for now and move on to chapter 9, where we study “Jack”, a simple object oriented high level language, that we will in later chapters write a compiler for. The compiler will use the VM translator implemented in chapter 7 and chapter 8 as a back end.
I have now implemented a translator for a part of the virtual machine that is used in Nand2Tetris.
A point of the virtual machine language in the course is to be used as an intermediate between high level language and assembly, in the compiler to be designed in later chapters. The virtual machine translator translates VM instructions to assembler. Its implementation is split between project 7, which I have now completed, and project 8.
The virtual machine is stack based which I enjoy by personal taste (as mentioned in a previous post I have inherited that taste from my use of RPN on HP calculators since the 80s).
The design of the virtual machine specification feels, as all concepts I have so far gone through in this course, elegant and as simple as it can be.
1. the basic arithmetic and logic operations (the same as the the ALU and CPU previously designed),
2. push and pop to transfer data between RAM and the stack,
3. program flow commands,
4. function call commands
Project 7 implements 1 and 2.
Since the VM language basic syntax is always the same, the parsing is in fact simpler than the assembly parsing of project 6. The interesting part is assembly code output, where there is potential for optimization of the number of assembler commands generated for a given VM command. I have myself worked very little on optimization because I rather want to carry on, but I might come back to it later on.