Skip to content
Tech News
← Back to articles

Bijou64: A variable-length integer encoding

read original more articles
Why This Matters

Bijou64 is a new variable-length integer encoding designed for the Subduction CRDT sync protocol, offering both security benefits and improved performance over traditional encodings like LEB128. Its design ensures a single, canonical representation for each number, reducing decoding complexity and increasing speed, which can benefit various binary protocols requiring efficient integer encoding.

Key Takeaways

It’s nice when you work on security and accidentally get some performance for free. This is the story of a small encoding called bijou64 — a variable-length integer (varint) encoding that we developed for the Subduction CRDT sync protocol. It was intended to fix a subtle signature-verification bug by making each number only representable a single way. It turned out to also run a few times faster than the more common varint LEB128 .

That “bijou” is French for “small jewel” is a happy coincidence 💎

We didn’t set out to write a fast varint, but it turns out that our design constraints made for an encoding that has to do less work.

The Problem

Many binary protocols need a compact way to encode integers that are usually small but occasionally large. Variable-length integer encodings (“varints”) solve this, but most designs treat canonicality as an afterthought — something enforced by a runtime check in the decoder rather than by the structure of the encoding itself.

Since it’s the most common varint, we’re going to pick on LEB128 a bit here. I want to emphasize how much LEB128 is a great choice for many projects, and the reasons that it was not a good choice for us also applies to the other formats that we looked at. It just happened to not be a perfect fit for our use case.

This is where the “128” in “LEB128” comes from, since 2⁷ = 128

LEB128 encodes a number as a sequence of 7-bit segments, with the high bit of each byte signaling “more bytes follow” (there can be many such continuation segments, though we only show 2 segments below). This lets you avoid always writing 8 bytes (64 bits) even when you’re representing a small number (which would be mostly zeroes). This is like writing 5 instead of 000000005 just to get the correct number of characters. Putting aside that working in 7-bit is odd, this is a practical solution!

LEB128 layout

But there’s a problem: the number 0 can be encoded as the single byte 0x00 , but it can also be encoded as 0x80 0x00 . Or 0x80 0x80 0x00 . Or any longer sequence of 0x80 s ending in a zero byte. 0x80 is 1 0000000 , so you can have as many of those as you want and still get 0 ! Most LEB128 decoders will happily accept any of them. This isn’t unique to zero; nearly every number in LEB128 can be represented many ways.

... continue reading