QBE QBE compiler backend
QBE 1.3 - Release notes
qbe-1.3
QBE 1.3 took a while to cook, but it is the most significant release since 1.0 with around 7k new lines of code and 1.5k deleted ones. In addition to the usual bug fixes, QBE gained a new and original IL matching algorithm, new optimizations from Roland Paterson-Jones, Scott Graham added support for the Windows ABI, and I implemented a plan suggested by Michael Forney to have QBE produce position-independent code (as in shared objects). QBE is teamwork, and I am happy to thank all the contributors to this release. In the rest of the notes we will take a closer look at some of the meat of the release.
Faster
Every once in a while, the QBE fly stings an outstanding programmer. This time, Roland was the victim! Roland suggested we look at the coremark benchmark to give us a concrete but simple playground to optimize. Initial measurements with qbe-1.2 revealed that we were far behind our “70% of gcc -O2 ” goal, closer to 40%. We decided to address this for the 1.3 release.
An early inspection of profiling data revealed that the performance gap boiled down in large part to how two functions are treated: ee_isdigit and crcu8 . Interestingly, these functions are not really idiomatic C; for instance, ee_isdigit is typically inlined textually, uses && instead of & and skips the superfluous ternary operator; as for CRC, it is best implemented using a pre-computed table. This observation was a bit disappointing because, aside from QBE's lack of inlining, it did not point us to a general source of overhead that would apply to other compilation loads. On the other hand, it is expected that CPU-bound code spends a majority of time in compact code sections.
Nonetheless, we implemented numerous optimizations (GVN/GCM, loop optimization, if-elimination, CFG simplification, …) that we could try on both coremark and more realistic usages like the Hare test suite. In the end, we decided to keep only a subset of vetted passes and now score more than 63% of the performance of commercial compilers on vanilla coremark. Notably, we excluded inlining from the optimizations set to postpone solving its incompatibility with the streaming per-function compilation model of QBE. Modifying the coremark benchmark to inline the ee_isdigit function and use a simpler branch-free implementation of crcu8 makes QBE reach its 70% goal. The new optimizations should also benefit Hare users: I measured a 33% improvement on the Hare test suite against qbe-1.2 (1.7s vs 2.6s).
Smarter
Since its early days, QBE used a bottom-up tree-numbering algorithm inspired by Ken Thompson's Plan9 C compiler (see 5.5. Addressability). The algorithm is fairly generic but has some subtleties in dealing elegantly with associativity and commutativity of arithmetic operators. It has been a long-standing goal of mine to implement a metaprogramming solution to this problem. QBE 1.3 sees the realization of this goal.
... continue reading