Tech News
← Back to articles

AVX2 is slower than SSE2-4.x under Windows ARM emulation

read original related products more articles

If you compile your app for AVX2 and it runs on Windows ARM under Prism emulation, is it faster or slower than compiling for SSE2-4.x?

I assumed it would be roughly the same — maybe slightly slower due to emulation overhead, but AVX2's wider operations would compensate. The headline gives it away: I was wrong.

💡 TLDR: AVX2 code runs at 2/3 the speed of equivalent SSE2-SSE4.x optimised code under emulation on Windows 11 ARM.

'Should I compile for AVX2 if my app might run on Windows ARM?' has a clear answer: No. At least if performance matters.

This post explains how I found out, what I measured and how, the benchmark results, and why.

Curiosity

A few weeks ago, in a Hacker News thread on WoW (the game) emulated performance on Windows ARM, I wondered:

I’ve been testing some math benchmarks on ARM emulating x64, and saw very little performance improvement with the AVX2+FMA builds, compared to the SSE4.x level. (X64 v2 to v3.) ... I’ve found very little info online about this.

Well, I nerdsniped myself, because those math benchmarks are now complete and so we have the perfect framework for testing AVX2+FMA emulation performance overhead on ARM Windows. I have no technical reason to do so: if you use our compiler we encourage that if you want to run your app on Windows ARM to just compile your app for Windows ARM. It's simply: I want to know.

... continue reading