Skip to content
Tech News
← Back to articles

C++26 Shipped a SIMD Library Nobody Asked For

read original get Intel oneAPI Data Parallel C → more articles
Why This Matters

The inclusion of std::simd in C++26 highlights the ongoing effort to simplify SIMD programming across diverse architectures, aiming to reduce complexity and improve portability. However, current implementations fall short in performance and expressiveness, often outperforming the abstraction and raising questions about its immediate utility for developers. This development underscores the challenges in standardizing high-performance, portable SIMD solutions and the importance of balancing ease of use with efficiency.

Key Takeaways

C++26 ships with std::simd (P1928), a library-based portable SIMD abstraction. The pitch is seductive: write SIMD code once, compile it for AVX2, AVX-512, NEON, SVE. No more #ifdef __AVX512F__ spaghetti. No more intrinsics. Just std::simd<float> and let the compiler figure out the rest.

A satirical repository by NoNaeAbC recently made the rounds, presenting “6 reasons to use std::simd” — each one a verified demonstration of a real deficiency. I reproduced the benchmarks and dug deeper. It compiles 10x slower, runs slower than scalar loops, defaults to the wrong vector width, and can’t express the operations that actually matter in real SIMD code. The compiler’s auto-vectorizer, the thing std::simd was supposed to replace, beats it on every metric that counts.

From Physics Lab to ISO Standard: How We Got Here

The story of std::simd starts with one person: Matthias Kretz, a researcher at GSI Helmholtzzentrum für Schwerionenforschung (the German heavy-ion research center in Darmstadt). Around 2009-2010, Kretz built the Vc library — “portable, zero-overhead C++ types for explicitly data-parallel programming” — to vectorize high-energy physics simulations. Vc was a serious project: 5,000+ commits, used at CERN, and one of the earliest attempts at a clean C++ SIMD abstraction. The idea was right: express parallelism through the type system rather than through intrinsics or new control structures.

Kretz then took Vc’s design to the C++ committee. The proposal went through a remarkably long standardization journey. P0214 (”Data-Parallel Vector Types & Operations”) appeared around 2016 and went through at least nine revisions. It was published as part of the Parallelism TS 2 (ISO/IEC TS 19570:2018) — a Technical Specification, which is the committee’s way of saying “we think this is interesting but we’re not ready to commit.” GCC 11 shipped an experimental implementation under <experimental/simd> in 2021, and Kretz maintained a standalone version at VcDevel/std-simd.

Then came P1928, the proposal to promote std::simd from experimental TS into the C++26 standard proper. This is where things get interesting. The proposal had been in some form of committee discussion for nearly a decade by the time it was voted into C++26. During that decade, the competitive landscape shifted dramatically under its feet. Auto-vectorizers in GCC, Clang, and MSVC improved enormously. ISPC proved that language-level SIMD could generate better code than library-level abstractions. ARM shipped SVE, a scalable-width SIMD ISA that fundamentally challenges fixed-width abstractions. And compiler support for -march=native matured to the point where scalar loops routinely auto-vectorize to the widest available registers.

Kretz’s original vision — write SIMD code once, compile it everywhere — was and remains a worthy goal. The Vc library in 2012 was genuinely ahead of its time. The problem is that std::simd in 2026 is the 2012 solution arriving after the world moved on. The committee spent a decade polishing a library-based approach while compilers solved the easy cases automatically and ISPC solved the hard cases with language-level support. By the time std::simd graduates from experimental to standard, it’s competing against tools that do its job better — and those tools have a decade head start.

The Libraries That Ate std::simd’s Lunch

While std::simd was working its way through the committee, the open-source ecosystem didn’t wait. Several libraries now occupy the exact space std::simd was designed for — and they do it better, because they can iterate on actual user feedback instead of committee consensus.

Google Highway is the most serious competitor. It bills itself as “performance-portable, length-agnostic SIMD with runtime dispatch.” That last part matters: Highway can detect the CPU at runtime and dispatch to the best available SIMD implementation — SSE4, AVX2, AVX-512, or NEON/SVE — without recompilation. std::simd has no runtime dispatch story at all. Highway is length-agnostic, meaning it works naturally with ARM SVE’s scalable vectors, which std::simd ‘s fixed-width model can’t express. The adoption list speaks for itself: Chromium, Firefox, JPEG XL (libjxl), libaom (AV1 codec), Jpegli, libvips. When Google needed portable SIMD for production image and video codecs, they built Highway — not std::simd .

... continue reading