Find Related products on Amazon

Shop on Amazon

Performance optimization, and how to do it wrong

Published on: 2025-06-25 18:14:26

2025-03-02 I recently tried to optimize convolutions using SIMD instructions, but what I thought would be a simple task ended up taking me days, with issue after issue popping up one after another. Some of them make sense in hindsight, but others were utterly baffling. While the specific examples are for direct convolution, these considerations apply to pretty much any code with a hot loop. Note: This blog post is mostly written from memory, since I didn't keep around every version of the code being discussed. The values in the benchmarks are a rough recreation of the real values. Background I work on burn and recently wanted to optimize direct convolution on the burn-ndarray CPU backend. For convolutions you need to move a two-dimensional kernel across an input feature map and sum all the values across all input channels. This is repeated for each output channel. The input can have padding pixels of zero-padding around the actual data, and the kernel can move in a strided manner ... Read full article.