Over 2'000 SIMD kernels for mixed-precision BLAS-like numerics packaged for 7 programming languages — from Float6 to Float118, on RISC-V, Intel AMX, AVX2 & AVX-512 on x86, Arm SME & SVE, and Relaxed WASM SIMD in 5 MB or less.
These are a few lines of celebratory “proud-dad” rumblings and highlights from my largest open-source release to date. I’m killing my SimSIMD project and re-launching under a new name — NumKong — StringZilla’s big brother. Over 2'000 SIMD kernels for mixed precision numerics, spread across 200'000 lines of code & docstrings, in 7 languages. One of the largest collections online — pretty much the same size as OpenBLAS, the default NumPy BLAS (Basic Linear Algebra Subprograms) backend (detailed comparison below).
What’s inside?
RISC-V Vector Extensions, Intel AMX & Arm SME Tiles
From Vectors to Matrices and Higher-rank Tensors
From BFloat16 and Float16 to Float6 — E3M2 & E2M3 on any CPU
Native Int4 & UInt4 Dot Products via Nibble Algebra
Neumaier & Dot2 for higher-than-BLAS precision
Ozaki Scheme for Float64 GEMMs via Float32 Tile Hardware
Haversine & Vincenty for Geospatial — 5'300x faster than GeoPy
... continue reading