SIMD Within a Register: How I Doubled Hash Table Lookup Performance

While working on a Cuckoo Filter implementation in C#, I created an array-like structure for the underlying hash table. I chose an 8-bit fingerprint: it aligns nicely on a byte boundary and still keeps the false-positive rate around 3 %.

The layout looked straightforward—just a byte array where the start of each bucket is calculated as bucketIdx * bucketSize . The size of each bucket is 4 slots, which is a solid choice for Cuckoo Filter.

Bucket 0 3A 00 B7 F2 Bucket 1 4C 91 00 DE Bucket n AA 00 5F C8 Table: ...

But those four bytes in a bucket reminded me of something. They feel like … an integer!

I wasn’t chasing ultra-low latency—after all, this is C#—but I couldn’t resist experimenting. Could I speed up lookups in my Cuckoo Filter by replacing the 4-byte bucket with a plain old 32-bit integer?

Time to find out. 💪

#A Flexible and Simple Implementation

Here’s the naïve storage I began with. To keep the rest of the code clean, all table logic lives in its own class, whose heart is a pre-allocated byte array:

private readonly byte [] _table ;

Each bucket has 4 slots, so there was no need for a second dimension; mapping a key to its bucket is obvious:

... continue reading