Oct 09

Linux application profiling can be spectacular

I now have an assembler that works for Hack assembly files without symbols. This was the first step proposed in project 6 of Nand2Tetris.
However, I really got scared by the assembler’s execution time:

This is to assemble a 20 thousand-line assembly file. To perform the same task, the assembler provided by Nand2Tetris, written in Java, takes a fraction of a second.
This felt like an interesting challenge. My guess was that the bottlenecks could lie in my use of regular expressions (see previous post), or in the file I/O.
I took this as an opportunity to test some profiling tools.
I installed OProfile and used it:

This shows the binaries where my application spent more than 10% of its time. More than 80% is spent in the C and C++ libraries.
Trying to take it one level down:

In theory, this should shows the names of the functions where it spends more than 2% of the time, but for some reason it does not work for the C++ library.
Still, it feels like the regular expressions are a very good candidate.
My first idea of optimization was to construct the regular expressions in the parser’s constructor, instead of doing it in the parsing methods (once per line of assembly code). That change took me 5 minutes, and here is the result:

That’s right, the times are divided by 100! :-)
OProfile now tells us:

Still a lot related to regular expressions, so there is certainly much more to be done. But I am happy for now (I am no longer ashamed). 😉
Edit: for the record, I have also made the regular expressions static const, which in general seem better, but that did not increase performance further.
Edit 2: I have also tried std::regex::optimize, but that does not either affect performance in my case.