Oct 10

Nand2Tetris: assembler implemented and verified (project 6)

Nand2Tetris‘ assembler/comparator thinks that the 20000 line-binary file produced by my assembler for the pong game is correct to the bit, which means that my assembler, although I know it is not even close to being robust, is now good enough for my purpose.
As usual, the book contains a very detailed analysis of the problem to solve, and a clean design proposal. What is left is quite a straightforward implementation. Still, it is not entirely trivial, and one gets the satisfaction to have gone one step further towards the goal of a computer built from Nand gates that will be able to run graphics programs written in a high level language.
From a software and hardware development process perspective, the course is also very pedagogic, providing the means to test the results of every project. Encouraged by that mindset, I implemented a test class for the assembler parser, that helped me to verify that I had not broken anything when I added more functionality. In fact, I did write the test cases and run them before even starting to write the corresponding parser code, so one could say that I applied the principles of test driven development.
Given the little scope of the project, I implemented support for this little unit testing in my main() function:

In order for PARSERTESTER_HPP to be defined, I only have to add:

This way, I can keep the rest of my file and Makefile structure untouched. When the #include is there, my application will be a unit test application instead of being the full assembler. My the test code is written to throw an exception any time a test does not pass. The exception won’t be caught and will lead to a crash of the application. If the test application writes “Test successful”, it means that it run to completion without hitting a throw. Primitive, but simple.
Most of the time I spent in this project was researching a good solution for the parser in C++ (see my 3 previous articles).
The times I showed in Performance of C++11 regular expressions were for a one pass-implementation of the assembler that had not support for labels.
Interestingly, the times for the complete version, which has two passes, i.e. parses the whole source file twice, are not much longer.
One pass:

Two passes:

It would therefore seem that most of the time is spent in input/output from and to hard disk. A bugged version of the assembler that did not write the output file and that I happened to time seemed to show that most of the “sys” time in a working version is spent writing the file to disk. Maybe that could be optimized in some way (I haven’t done the math).
I will now move on to chapter 7, entitled VM I: Stack Arithmetic. :-)

Oct 10

Performance of C++11 regular expressions

Since I want to talk about C++11 program performance, I guess I should start saying that I am running g++ (GCC) 4.9.1 on Linux (Manjaro).
In What’s in a regular expression?, I presented some preliminary testing of the regular expression functionality in C++ that I could use in the scope of the assembler parser for Nand2Tetris.
In Linux application profiling can be spectacular, I explained how I divided the running time of my assembler by 100, by constructing regular expressions only once instead of once per row to parse.
After the regular expression improvement, I could not help rewriting the parser (only 130 lines of code) code with some std::string functions instead of regular expressions, to see what running time I would get.
The std::string functions I used were the following:

  • std::remove_if()
  • string::erase()
  • std::isdigit()
  • std::isspace()
  • string::substr()
  • string::find()

I guess you get the picture (I won’t show the code because it is contrary to Nand2Tetris’ policy).
The numbers are as follows to assemble a 20000 line-assembly file,
with regular expressions:

and with std::string:

So it is better by 30% with std::string.
My Parser.cpp is about the same size in both cases, but I guess that is because the pattern rules are simple. If they got more complicated, the regular expression version would be cheaper to maintain and not grow so much in size.

Oct 09

Linux application profiling can be spectacular

I now have an assembler that works for Hack assembly files without symbols. This was the first step proposed in project 6 of Nand2Tetris.
However, I really got scared by the assembler’s execution time:

This is to assemble a 20 thousand-line assembly file. To perform the same task, the assembler provided by Nand2Tetris, written in Java, takes a fraction of a second.
This felt like an interesting challenge. My guess was that the bottlenecks could lie in my use of regular expressions (see previous post), or in the file I/O.
I took this as an opportunity to test some profiling tools.
I installed OProfile and used it:

This shows the binaries where my application spent more than 10% of its time. More than 80% is spent in the C and C++ libraries.
Trying to take it one level down:

In theory, this should shows the names of the functions where it spends more than 2% of the time, but for some reason it does not work for the C++ library.
Still, it feels like the regular expressions are a very good candidate.
My first idea of optimization was to construct the regular expressions in the parser’s constructor, instead of doing it in the parsing methods (once per line of assembly code). That change took me 5 minutes, and here is the result:

That’s right, the times are divided by 100! :-)
OProfile now tells us:

Still a lot related to regular expressions, so there is certainly much more to be done. But I am happy for now (I am no longer ashamed). 😉
Edit: for the record, I have also made the regular expressions static const, which in general seem better, but that did not increase performance further.
Edit 2: I have also tried std::regex::optimize, but that does not either affect performance in my case.

Oct 09

What’s in a regular expression?

While working on my Nand2Tetris assembler parser (project 6), I devised with myself about whether or not I would use C++11’s regular expressions to parse assembly lines. I answered yes to that question, because:

  • Regardless of the performance discussion, I like the idea of a language that can match any text pattern. When I am in an environment when such a tool is available, I do not want to reinvent that wheel.
  • While manipulating some text in an editor, I often feel that a certain repetitive manual task would take a few seconds instead of many minutes if I just had a good command of regular expressions. Therefore, this is an opportunity to get better at it.

Spoiler warning: the rest of this post will discuss some details of the assembler parser implementation for Nand2Tetris. While I will mostly focus on the regular expression part, and do not post a complete solution, you should not read it if you want to take a completely fresh start at the project.
The power horse of the Hack computer instruction set (see chapter 6) is the C-instruction.
To quote the book, C-instruction:

Either or. That means not both. I.e. an instruction with neither dest nor jump (that would be a NOP) is invalid.
I am thinking about having one regular expression for the A-instruction, one for the C-instruction, and one for the symbol pseudo-command (that names labels).
My ambition for the parser is just to decode a correctly written line of assembly code:

  • An empty line or a line of comments shall just be discarded, i.e. my regular expressions shall not match them.
  • An A-instruction line shall be identified as such and the symbol string shall be extracted.
  • A C-instruction line shall be identified as such and the dest, comp and jump strings shall be extracted.
  • I do not ambition for my parser to reject an invalid symbol, dest, comp, or jump string. My plan is to let the invoker take care of that. That should make for relatively simple regular expressions and for now, it seems like an acceptable limitation (to be confirmed).

C++11 use a modified ECMAScript regular expression grammar. My reference for this is cppreference.com, but I am also using other sources to help me understand that information, which is quite dry.
A first attempt at a regular expression for the C-instruction could be (I separate the string in several strings in a C/C++ manner, in order to make the structure clearer – also note the C/C++ escaping of “\”, which is not part of the actual regular expression):

Let’s try it:

results in

The first match is the whole expression that matched. The other matches are the extracted strings. This did not work: the comp match also contains the jump part, and the jump match is empty. A reasonable way to prevent the comp part from eating the jump part is probably to just exclude “;” from the class of valid characters for comp. One more try:

results in

That worked. I replaced "\\S+" (1 or more non-space characters) by "[^\\s;]+" (1 or more characters that are neither spaces nor semicolon).
But since our rule for comp is so unspecific, and we require neither dest nor jump, I worry that a line with only one comment and without any spaces would match that rule too:

results in

That’s what I thought. Not good. An obvious way to fix this is to exclude “/” in the same way as we excluded “;”. Given the low level of ambition of our pattern matching for now, it is also a reasonable solution:

gives

It worked. Better get back to my parser now.

Oct 07

Generic Makefile for a single folder C++ project

Here comes a short interruption from my Nand2Tetris studies. The interruption is actually strongly related to Nand2Tetris. I am at the point where I should write an assembler in a language of my choice. Given my personal experience and environment, my natural choice is C++ under Linux.
When it comes to the IDE choice, I have recently used Code::Blocks with success, except for one very irritating issue. I and apparently others (Google for it if you are interested) have experienced that when using Code::Blocks in XFCE (Manjaro’s standard desktop environment), copy/paste of code does not work well. This is a pretty basic function and while there is a workaround, it is painful and I finally got tired of it. Being nerdy as I am, I decided to gave a new go to GNU Emacs, my friend of old times. Since I wanted my experience to be as modern as possible – in the scope of Emacs – I watched through this and other videos, read quite many blog articles, and finally installed CEDET from their Bazaar repository, since the version bundled with my Emacs 24.3.1 is not complete.
I went through the beginning of the CEDET manual and was quite impressed by what I saw. But when I got into the EDE part (project management) I just found it too complicated for the kind of small project I am starting right now (a small assembler, remember?).
So I will still use Emacs + CEDET, but “managing source files” and building will be handled by a simple Makefile.
And if anybody is interested and doesn’t know, it is possible to a write Makefile that is generic for a C++ flat project (all cpp, hpp and Makefile in the same directory). And by generic I mean:

  • Put any number of cpp and hpp files in the directory TheUltimateApplication.
  • Put the generic Makefile in the same directory. Do not make a single change in the file (except of course, if you are using special libraries or flags, in which case, well, just add them).
  • Run cd path/to/TheUltimateApplication, make, and ./TheUltimateApplication. Watch the results on your screen.
  • Of course, I also mean that the dependencies are handled correctly (as far as I know, I make no guarantee).

Here is the Makefile:

Note that there is even a target that will run your program: make run. I use that to run the program from Emacs (M-x compile RET make run RET).

Also note how this Makefile automagically learns about any new file in the project. Just put in it the directory, it will be in the project (i.e. automatically built and linked next time you run make). That’s what I call effective source file management!

Oct 06

Nand2Tetris: computer implemented and verified (project 5)

Yes, that’s right. In three days (quite busy, I have to admit), I have constructed a working 16-bit computer from Nand gates, that is able to manage I/O with memory mapped keyboard and monochrome screen. Am I a genius? Not even close. I have just been walking in the footsteps of Noam Nisan and Shimon Schocken, two wonderful professors who are currently giving me for free a once-in-a-lifetime experience in the category construct-a-simple-and-yet-powerful-computer-from-scratch! Their design is so elegant that every project step mostly consists in putting together pieces of a puzzle that just fall into place.
A few highlights:

  • Gluing together the RAM with the keyboard and screen gave for some interesting muxing/demuxing, due to the different sizes of their respective memory areas.
  • When it comes to the CPU, I had first planned to construct some separate control logic chip, but the pieces fit together so well that I finally didn’t think it was necessary.
  • Lastly, composing the computer from ROM, RAM + I/O and CPU was almost as easy as building the first logic chips of chapter 1.

Certainly if it is so easy, I cannot have learned that much, right? Wrong. I guess pedagogy could be defined as the ability to provide the highest teaching value per unit of of time spent by the student. And Noam Nisan and Shimon Schocken are certainly masters of pedagogy. No wonder Shimon Shocken is currently working on a program to revolutionize math teaching for young children.
A couple of additional details about his module:

  • Test scripts are for the first time in this module coming in two flavors: the regular ones and the “external” ones. I have not found an explanation about what the external ones could be all about (I might have overlooked some information in my hungry book and web site browsing), so I diffed the CPU scripts, and found that the only difference was an explicit reference to the DRegister chip in the non-external script. My implementation did not use that chip, it just used the regular Register as a D-Register. I think that the DRegister chip really is a regular Register, with an additional GUI-side effect allowing one to see its contents – presented as the D-register’s contents – while running. When it comes to the non-external test script, I think the point is that it is not only testing the outputs of the tested chip, but also some internal state (in this case the D-register). Anyway, when using DRegister instead of just Register for the D-Register, both test scripts run successfully on my implementation.
  • When I run the CPU test script for the first time, I had two control bit bugs (as it turned out). The first one made the output file comparison fail on the 2nd result row. That pretty much indicated an obvious bug. But the second bug made the script fail at the 62nd result row, which I found really surprising. That bug was more nasty, which can happen in a design that after all is not that trivial. Comparing the result rows did however allow a bug resolution within minutes.

Next step: writing an assembler for the machine. I cannot wait. :-)

Oct 06

Nand2Tetris: project 4 completed

Module 4 in Nand2Tetris is an interruption in the construction of the computer, in order to discover and understand the instruction set architecture (i.e. the computer’s binary language) that is chosen, before completing the hardware architecture that will realize it. Not surprisingly, the operations allowed on the registers match very closely the operations offered by the ALU previously constructed (in project 2).
As usual, the projects exercises are perfectly picked:

  • Implementation of the multiplication operator in assembly (the ALU is voluntarily simplistic, and does not include multiplication). Since I have never worked with a processor that did not have multiplication (not that I recall anyway), I had never done that. This is a milestone in my life. :-)
  • Implementation of an assembly program that blackens the (simulated) screen when one presses a key on the (simulated) keyboard, and clears it when releasing all keys. The screen and the keyboard are memory mapped, so that makes for a conveniently accessible low level interface.

When manipulating the registers, I couldn’t help starting to imagine how the whole thing would look like if it was a stack machine instead. After all, I did fall in love with RPN in 1989 when I started to use my HP-28S, and I recall the long hours spent programming it to calculate the pH of aqueous solutions or the greatest common divisor of two integers with nostalgia.

But there is a module 13 in Nand2Tetris, called “More Fun to Go”, including an empty project entitled “It’s your call!”, so who knows what will happen when I get there?

Oct 05

Nand2Tetris: project 3 completed

Oh my! Now I have 32 KB RAM and a program counter. That course is just going too fast…
A few comments:

  • The hardware simulator showed some hanging when I was testing my RAM64 and RAM512 (implemented in pure HDL down to 1-bit register, which implied many chips in my own computer’s RAM). The workaround was to stop running the script, single step a few times, and restart the script from there.
  • The claim that one gets to build all chips from the ground up from Nand gates in the course has one exception: the data flip-flop. That chip is given, and I am not sure whether one actually can build it and run it in the simulator, since it is the lowest level chip that interacts with the clock.
  • The program counter implementation was particularly interesting. I started simply, then got the feeling that using a 4-way Mux16 could be clever, implemented such a solution, which worked, didn’t like its complexity, and went back to the simpler solution. The simpler solution worked too, but was well… simpler and actually used fewer gates.

Gotta go back to read about this instruction set architecture we’re gonna choose…

Oct 05

Nand2Tetris: ALU implemented and verified (project 2)

My project 2 of Nand2Tetris is now completed, and I have a working ALU. In the process of implementing it, I also created 2 more chips. I had interpreted the book‘s instructions as an encouragement to build at least one separate chip as a building block that would be used at least twice in the ALU.
As mentioned in previous posts, the course is entirely free and open, and I am taking it as self studies at home. Ironically, although my academic education (that ended more than 20 years ago) is highly ranked (at least in France, where I took it), I did not have many courses at that level of quality, or many teachers who were as inspiring as Shimon Schocken.
This course is just amazing, and I urge anyone interested in computer science to take it.
Anyway, the ALU is now working. So I should be able to implement it in Minecraft red stone, right? :-) Well, I do have other things to do in my life.
After all, I have to add some sequential logic to that computer I am building.
Enjoy!

Oct 04

Nand2Tetris: project 1 completed

After my previous article about Nand2Tetris, I jumped directly into module 1. As a matter of routine, I first read the chapter in the book, browse through the slides that can be found on the web site (the book chapters can actually also be found on the web site), and then follow the project instructions (also on the web site).

I am now very proud of having built and verified the following logic gates:

  1. Not gate
  2. And gate
  3. Or gate
  4. Xor gate
  5. Mux gate
  6. DMux gate
  7. 16-bit Not
  8. 16-bit And
  9. 16-bit Or
  10. 16-bit multiplexor
  11. Or(in0,in1,…,in7)
  12. 16-bit/4-way mux
  13. 16-bit/8-way mux
  14. 4-way demultiplexor
  15. 8-way demultiplexor

I won’t tell how, that would be against Nand2Tetris’ policy (students have to find the solutions for themselves).
I guess I am now ready to dig into Boolean arithmetic (module 2, as opposed to Boolean logic, which was the topic of module 1).