An optimizing compiler doesn't help much with long instruction dependencies
Published on: 2025-06-11 17:17:10
We at Johnny’s Software Lab LLC are experts in performance. If performance is in any way concern in your software project, feel free to contact us.
There was a rumor I read somewhere related to training AI models, something along the lines “whether we compile our code in debug mode or release mode, it doesn’t matter, because our models are huge, all of our code is memory bound”.
I wanted to investigate if this is true for the cases that are interesting to me so I wrote a few small kernels to investigate. Here is the first:
for (size_t i { 0ULL }; i < pointers.size(); i++) { sum += vector[pointers[i]]; }
This is a very memory intensive kernel. The data from vector is read from random locations – depending on the size of vector we can experiment with data being read from L1, L2, L3 caches or memory.
I compiled this loop with Gcc, optimization level -O0 (no optimizations) and -O3 (full optimizations). Then I calculated the instruction count ratio – instructions_count(O0) / instructio
... Read full article.