TL;DR: Most C++ and Rust thread-pool libraries leave significant performance on the table - often running 10× slower than OpenMP on classic fork-join workloads and micro-benchmarks. So I’ve drafted a minimal ~300-line library called Fork Union that lands within 20% of OpenMP. It does not use advanced NUMA tricks; it uses only the C++ and Rust standard libraries and has no other dependencies.
OpenMP has been the industry workhorse for coarse-grain parallelism in C and C++ for decades. I lean on it heavily in projects like USearch, yet I avoid it in larger systems because:
Fine-grain parallelism with independent subsystems doesn’t map cleanly to OpenMP’s global runtime.
with independent subsystems doesn’t map cleanly to OpenMP’s global runtime. Portability of the C++ STL and the Rust standard library is better than OpenMP.
of the C++ STL and the Rust standard library is better than OpenMP. Meta-programming with OpenMP is a pain - mixing #pragma omp with templates quickly becomes unmaintainable.
So I went looking for ready-made thread pools in C++ and Rust — only to realize most of them implement asynchronous task queues, a much heavier abstraction than OpenMP’s fork-join model. Those extra layers introduce what I call the four horsemen of low performance:
Locks & mutexes with syscalls in the hot path. Heap allocations in queues, tasks, futures, and promises. Compare-and-swap (CAS) stalls in the pessimistic path. False sharing unaligned counters thrashing cache lines.
With today’s dual-socket AWS machines pushing 192 physical cores, I needed something leaner than Taskflow, Rayon, or Tokio. Enter Fork Union.
Hardware: AWS Graviton 4 metal (single NUMA node, 96× Arm v9 cores, 1 thread/core). Workload: “ParallelReductionsBenchmark” - summing single-precision floats in parallel. In this case, just one cache line ( float[16] ) per core—small enough to stress synchronization cost of the thread pool rather than arithmetic throughput of the CPU. In other words, we are benchmarking kernels similar to:
#include
... continue reading