Hashed sorting is typically faster than hash tables

Problem statement: count the unique values in a large array of mostly-unique uint64s. Two standard approaches are:

Insert into a hash table and return the number of entries.

Sort the array, then count positions that differ from their predecessor.

Hash tables win the interview ( O ( n ) O(n) O(n) vs O ( n log ⁡ n ) O(n \log n) O(nlogn)), but sorting is typically faster in a well-tuned implementation. This problem and its variants are the inner loop of some of the world’s biggest CPU workloads.

Benchmark highlights

Here is performance on M2 Pro , comparing our best-tuned hash table and our best-tuned sorting implementation. We also include Rust’s default implementations ( sort_unstable() , and HashSet with foldhash::fast ) as a reference point for high-quality general-purpose implementations:

Data size Baseline hash table Baseline sorting Tuned hash table Tuned sorting 8 KiB 3.8 µs 5.1 µs 1.6 µs 6.5 µs 256 KiB 219 µs 264 µs 193 µs 92 µs 8 MiB 8.1 ms 12.0 ms 7.1 ms 3.9 ms 256 MiB 875 ms 464 ms 269 ms 185 ms 2 GiB 7.6 s 4.3 s 2.6 s 1.9 s

Tuned sorting beats our best hash table by ~1.5× on non-tiny sizes, and is up to 4× faster than the excellent “Swiss Table” hash tables that ship with Rust’s standard library.

Benchmark code is available.

... continue reading