Oct 24

OpenGL SuperBible: first example under Xlib

I have started to read the OpenGL SuperBible 6th edition, which is apparently a good reference on the topic (I have never used OpenGL in my life, although I did some C++ programming on a Silicon Graphics Indigo in 1993…).
I tried to compile the example code, which failed, apparently because my version of GLFW is too new.
That did not discourage me for long, since I would rather avoid non-strictly necessary libraries like GLFW anyway.
Starting with the code from OpenGL’s own “Tutorial: OpenGL 3.0 Context Creation (GLX)”, I read the beginning of the book and tried to compile the first example, that is supposed to display a window full of red color.
The code from the book is:

That’s the kind of “hello word” I don’t really like, because it really is very far from a “hello world”. It assumes that one pulls in a whole header file supplied with the book, and a lot is going on that one does not control. It looks like this render() function is some kind of callback that gets called once in a while.
Instead, I tried to just copy/paste the contents of render() in my own startup code, mentioned above. That did not work at once, but I managed to figure it out.
To even start with OpenGL, one needs to understand the GLEW library concept (or some equivalent, but GLEW really seems to be the most common). The issue solved by GLEW is that quite many of today’s common OpenGL functions like glClearBufferfv(), are considered as extensions that may or may not be implemented by the GPU drivers, and that need to be resolved at runtime.
This is what GLEW does. It exposes the whole OpenGL API through a single #include <GL/glew.h>, and takes care of the rest via the GLEW library (which one needs to link – gcc‘s -lGLEW option will do that).
But it will not do its job if it is not first initialized (glewInit()) after an OpenGL context has been made current (glXMakeCurrent(display, win, ctx) in my case).
Additionally, in the setup mentioned above from “Tutorial: OpenGL 3.0 Context Creation (GLX)”, one has two buffers: a front one and a back one. It looks like the parameter 0 in glClearBufferfv(GL_COLOR, 0, red) refers to the back buffer, which is really what one wants. After it has been updated, one needs to swap the buffers with glXSwapBuffers(display, win).
When all that is in place, the code works, and one gets to see this:
Screenshot - 2014-10-24 - 21:47:28
You will find the code there. git clone it, run make and ./OpenGlTest under OpenGlTest.

Oct 22

Nand2Tetris: project 10 completed

I have now implemented and tested the compiler front end for the Jack compiler. The Jack language is object-based, without support for inheritance. Its grammar is mostly LL(0), which means that in most cases, looking at the next token is enough to know which alternative to choose for so-called “non-terminals”.
My final implementation is a classical top-down recursive implementation, as proposed in chapter 10.
That is however after a re-factoring of a previous version, where I tried to apply the principle that an element should itself assess whether it is of a given type. All my compileXxx() would return a boolean that indicates whether or not the current element is or not of type Xxx, with the side effect of generating the compiled code (XML for this chapter – real VM code in the next). The compileXxx() functions are then predicates, which I found kind of neat. It felt like programming Prolog in C++. I had a version of the compilation engine built on that principle that passed all the tests (which are, for the purpose of this chapter, comparisons of XML-output).
However, I later on realized that the underlying principle is just wrong from an LL(0)-perspective. LL(0) says that when there is an alternative in a grammar rule, e.g.:

which means that there may or may not be an expression in a return statement, the return statement level knows by a lookup of the next token whether or not there is an expression. This is the case here: there will be an expression if and only if the next token is not ‘;’.
With my predicate principle, the compileExpression() would in itself have to decide whether the current element is an expression or not. This in fact happens to be much harder than checking whether or not the next token is a semicolon (an expression may occur in other contexts than “return”, so it cannot check on semicolon).
In other words, even if my code worked, I would not have been able to sleep at night if I had not done a re-factoring. It was actually quite easy, albeit time-consuming and boring.

Oct 17

Nand2Tetris: project 9 completed (I guess)

The purpose of Nand2Tetris’ project 9 was to get to know the Jack language, a simple object-based programming language that we will write a compiler for in project 10 and project 11. Writing and testing a program in Jack was the way to get acquainted with the language.
I did write and test a short and silly Jack program, for the sake of it, but I am more interested in the compiler part, that I will now move on to.

Oct 17

Nand2Tetris: project 8 completed

With project 8 completed, I now have a Virtual Machine translator that takes any VM program as an input and outputs a corresponding Hack assembly file (see project 4) that can be run on the Hack CPU simulator. Since I had a few bugs, I ended up step through some code at the assembly level for the recursive Fibonacci example, which was an interesting exercise of concentration and patience.
The virtual machine in question is a single stack-stack machine, that provides support for function calling, including recursion.
After having implemented it, one feels quite at home reading section 2.1.1 Single vs. multiple stacks from the book Stack computers: the new wave by Philip Koopman (it is from 1989, so new is relative, but it is available online and it is one of the very few publications available about stack machine hardware).
Quoting the section:

An advantage of having a single stack is that it is easier for an operating system to manage only one block of variable sized memory per process. Machines built for structured programming languages often employ a single stack that combines subroutine parameters and the subroutine return address, often using some sort of frame pointer mechanism.

This “sort of frame pointer mechanism” is precisely what I have implemented in project 8. In our case, the stack machine is not built in hardware, it is implemented in the form of a translator to the machine language of a simple 16-bit register based CPU. It could however be directly built in hardware, as the many examples given in Stack computers: the new wave show. I suppose a very interesting project following this course would be to implement the VM specification of chapter 7 and chapter 8 in the HDL language in the same way as the Hack CPU was built in project 5. I am not sure how much the ALU would have to be modified to do that.
I will keep this project idea in the back of my mind for now and move on to chapter 9, where we study “Jack”, a simple object oriented high level language, that we will in later chapters write a compiler for. The compiler will use the VM translator implemented in chapter 7 and chapter 8 as a back end.

Oct 15

Nand2Tetris: project 7 completed

I have now implemented a translator for a part of the virtual machine that is used in Nand2Tetris.
A point of the virtual machine language in the course is to be used as an intermediate between high level language and assembly, in the compiler to be designed in later chapters. The virtual machine translator translates VM instructions to assembler. Its implementation is split between project 7, which I have now completed, and project 8.
The virtual machine is stack based which I enjoy by personal taste (as mentioned in a previous post I have inherited that taste from my use of RPN on HP calculators since the 80s).
The design of the virtual machine specification feels, as all concepts I have so far gone through in this course, elegant and as simple as it can be.
It features:
1. the basic arithmetic and logic operations (the same as the the ALU and CPU previously designed),
2. push and pop to transfer data between RAM and the stack,
3. program flow commands,
4. function call commands
Project 7 implements 1 and 2.
Since the VM language basic syntax is always the same, the parsing is in fact simpler than the assembly parsing of project 6. The interesting part is assembly code output, where there is potential for optimization of the number of assembler commands generated for a given VM command. I have myself worked very little on optimization because I rather want to carry on, but I might come back to it later on.

Oct 14

Manipulating directories in C++ under Linux

Currently working on Nand2Tetris project 7, I ran into the need to accept a directory name as an argument to a C++ application. I looked for a standard way to deal with directories in C++11, but that is unfortunately not part of the standard. Since I am running Linux, I do not have to suffer that much. I implemented a solution based on the Linux/POSIX API calls opendir()/readdir()/closedir() to list files in the directory, and realpath() to get the absolute path (since I needed to extract the directory name regardless of how it was pointed to).
I ended up with two relatively simple static member functions. I could probably have made them even more general, but this is good enough for me:

and:

You will find information about the necessary #includes in your favorite man page database.
If you spot a bug, please leave a reply.

Oct 10

Nand2Tetris: assembler implemented and verified (project 6)

Nand2Tetris‘ assembler/comparator thinks that the 20000 line-binary file produced by my assembler for the pong game is correct to the bit, which means that my assembler, although I know it is not even close to being robust, is now good enough for my purpose.
As usual, the book contains a very detailed analysis of the problem to solve, and a clean design proposal. What is left is quite a straightforward implementation. Still, it is not entirely trivial, and one gets the satisfaction to have gone one step further towards the goal of a computer built from Nand gates that will be able to run graphics programs written in a high level language.
From a software and hardware development process perspective, the course is also very pedagogic, providing the means to test the results of every project. Encouraged by that mindset, I implemented a test class for the assembler parser, that helped me to verify that I had not broken anything when I added more functionality. In fact, I did write the test cases and run them before even starting to write the corresponding parser code, so one could say that I applied the principles of test driven development.
Given the little scope of the project, I implemented support for this little unit testing in my main() function:

In order for PARSERTESTER_HPP to be defined, I only have to add:

This way, I can keep the rest of my file and Makefile structure untouched. When the #include is there, my application will be a unit test application instead of being the full assembler. My the test code is written to throw an exception any time a test does not pass. The exception won’t be caught and will lead to a crash of the application. If the test application writes “Test successful”, it means that it run to completion without hitting a throw. Primitive, but simple.
Most of the time I spent in this project was researching a good solution for the parser in C++ (see my 3 previous articles).
The times I showed in Performance of C++11 regular expressions were for a one pass-implementation of the assembler that had not support for labels.
Interestingly, the times for the complete version, which has two passes, i.e. parses the whole source file twice, are not much longer.
One pass:

Two passes:

It would therefore seem that most of the time is spent in input/output from and to hard disk. A bugged version of the assembler that did not write the output file and that I happened to time seemed to show that most of the “sys” time in a working version is spent writing the file to disk. Maybe that could be optimized in some way (I haven’t done the math).
I will now move on to chapter 7, entitled VM I: Stack Arithmetic. :-)

Oct 10

Performance of C++11 regular expressions

Since I want to talk about C++11 program performance, I guess I should start saying that I am running g++ (GCC) 4.9.1 on Linux (Manjaro).
In What’s in a regular expression?, I presented some preliminary testing of the regular expression functionality in C++ that I could use in the scope of the assembler parser for Nand2Tetris.
In Linux application profiling can be spectacular, I explained how I divided the running time of my assembler by 100, by constructing regular expressions only once instead of once per row to parse.
After the regular expression improvement, I could not help rewriting the parser (only 130 lines of code) code with some std::string functions instead of regular expressions, to see what running time I would get.
The std::string functions I used were the following:

  • std::remove_if()
  • string::erase()
  • std::isdigit()
  • std::isspace()
  • string::substr()
  • string::find()

I guess you get the picture (I won’t show the code because it is contrary to Nand2Tetris’ policy).
The numbers are as follows to assemble a 20000 line-assembly file,
with regular expressions:

and with std::string:

So it is better by 30% with std::string.
My Parser.cpp is about the same size in both cases, but I guess that is because the pattern rules are simple. If they got more complicated, the regular expression version would be cheaper to maintain and not grow so much in size.

Oct 09

Linux application profiling can be spectacular

I now have an assembler that works for Hack assembly files without symbols. This was the first step proposed in project 6 of Nand2Tetris.
However, I really got scared by the assembler’s execution time:

This is to assemble a 20 thousand-line assembly file. To perform the same task, the assembler provided by Nand2Tetris, written in Java, takes a fraction of a second.
This felt like an interesting challenge. My guess was that the bottlenecks could lie in my use of regular expressions (see previous post), or in the file I/O.
I took this as an opportunity to test some profiling tools.
I installed OProfile and used it:

This shows the binaries where my application spent more than 10% of its time. More than 80% is spent in the C and C++ libraries.
Trying to take it one level down:

In theory, this should shows the names of the functions where it spends more than 2% of the time, but for some reason it does not work for the C++ library.
Still, it feels like the regular expressions are a very good candidate.
My first idea of optimization was to construct the regular expressions in the parser’s constructor, instead of doing it in the parsing methods (once per line of assembly code). That change took me 5 minutes, and here is the result:

That’s right, the times are divided by 100! :-)
OProfile now tells us:

Still a lot related to regular expressions, so there is certainly much more to be done. But I am happy for now (I am no longer ashamed). 😉
Edit: for the record, I have also made the regular expressions static const, which in general seem better, but that did not increase performance further.
Edit 2: I have also tried std::regex::optimize, but that does not either affect performance in my case.

Oct 09

What’s in a regular expression?

While working on my Nand2Tetris assembler parser (project 6), I devised with myself about whether or not I would use C++11’s regular expressions to parse assembly lines. I answered yes to that question, because:

  • Regardless of the performance discussion, I like the idea of a language that can match any text pattern. When I am in an environment when such a tool is available, I do not want to reinvent that wheel.
  • While manipulating some text in an editor, I often feel that a certain repetitive manual task would take a few seconds instead of many minutes if I just had a good command of regular expressions. Therefore, this is an opportunity to get better at it.

Spoiler warning: the rest of this post will discuss some details of the assembler parser implementation for Nand2Tetris. While I will mostly focus on the regular expression part, and do not post a complete solution, you should not read it if you want to take a completely fresh start at the project.
The power horse of the Hack computer instruction set (see chapter 6) is the C-instruction.
To quote the book, C-instruction:

Either or. That means not both. I.e. an instruction with neither dest nor jump (that would be a NOP) is invalid.
I am thinking about having one regular expression for the A-instruction, one for the C-instruction, and one for the symbol pseudo-command (that names labels).
My ambition for the parser is just to decode a correctly written line of assembly code:

  • An empty line or a line of comments shall just be discarded, i.e. my regular expressions shall not match them.
  • An A-instruction line shall be identified as such and the symbol string shall be extracted.
  • A C-instruction line shall be identified as such and the dest, comp and jump strings shall be extracted.
  • I do not ambition for my parser to reject an invalid symbol, dest, comp, or jump string. My plan is to let the invoker take care of that. That should make for relatively simple regular expressions and for now, it seems like an acceptable limitation (to be confirmed).

C++11 use a modified ECMAScript regular expression grammar. My reference for this is cppreference.com, but I am also using other sources to help me understand that information, which is quite dry.
A first attempt at a regular expression for the C-instruction could be (I separate the string in several strings in a C/C++ manner, in order to make the structure clearer – also note the C/C++ escaping of “\”, which is not part of the actual regular expression):

Let’s try it:

results in

The first match is the whole expression that matched. The other matches are the extracted strings. This did not work: the comp match also contains the jump part, and the jump match is empty. A reasonable way to prevent the comp part from eating the jump part is probably to just exclude “;” from the class of valid characters for comp. One more try:

results in

That worked. I replaced "\\S+" (1 or more non-space characters) by "[^\\s;]+" (1 or more characters that are neither spaces nor semicolon).
But since our rule for comp is so unspecific, and we require neither dest nor jump, I worry that a line with only one comment and without any spaces would match that rule too:

results in

That’s what I thought. Not good. An obvious way to fix this is to exclude “/” in the same way as we excluded “;”. Given the low level of ambition of our pattern matching for now, it is also a reasonable solution:

gives

It worked. Better get back to my parser now.