Tech News
← Back to articles

XOR_singleheader: Header-only binary fuse and XOR filter library

read original related products more articles

Header-only Xor and Binary Fuse Filter library

Bloom filters are used to quickly check whether an element is part of a set. Xor filters and binary fuse filters are faster and more concise alternative to Bloom filters. Furthermore, unlike Bloom filters, xor and binary fuse filters are naturally compressible using standard techniques (gzip, zstd, etc.). They are also smaller than cuckoo filters. They are used in production systems.

This is a simple C header-only library. It implements both binary fuse and xor filters.

To use the state-of-the-art binary fuse filters, simply add (for example) the binaryfusefilter.h file to your project. It is made available under the business-friendly Apache license.

For a simple application built on this library, see https://github.com/FastFilter/FilterPassword

We are assuming that your set is made of 64-bit integers. If you have a set of strings or other data structures, you need to hash them first to a 64-bit integer. It is not important to have a good hash function, but collisions should be unlikely (~1/2^64). A few collisions are acceptable, but we expect that your initial set should have no duplicated entry.

The basic version works with 8-bit word and has a false-positive probability of 1/256 (or 0.4%).

uint64_t * big_set = ... binary_fuse8_t filter ; bool is_ok = binary_fuse8_allocate ( size , & filter ); if (! is_ok ) { // do something (you have run out of memory) } is_ok = binary_fuse8_populate ( big_set , size , & filter ); if (! is_ok ) { // do something (you have run out of memory) } binary_fuse8_contain ( big_set [ 0 ], & filter ); // will be true binary_fuse8_contain ( somerandomvalue , & filter ); // will be false with high probability binary_fuse8_free ( & filter );

We also have a 16-bit version which uses about twice the memory, but has a far lower false-positive probability (256 times smaller): about 0.0015%. The type is binary_fuse16_t and you may use it with functions such as binary_fuse16_allocate , binary_fuse16_populate , binary_fuse8_contain and binary_fuse8_free .

For serialization, there is a choice between an unpacked and a packed format.

... continue reading