It’s nice when you work on security and accidentally get some performance for free. This is the story of a small encoding called bijou64 — a variable-length integer (varint) encoding that we developed for the Subduction CRDT sync protocol. It was intended to fix a subtle signature-verification bug by making each number only representable a single way. It turned out to also run a few times faster than the more common varint LEB128 .
That “bijou” is French for “small jewel” is a happy coincidence 💎
We didn’t set out to write a fast varint, but it turns out that our design constraints made for an encoding that has to do less work.
The Problem
Many binary protocols need a compact way to encode integers that are usually small but occasionally large. Variable-length integer encodings (“varints”) solve this, but most designs treat canonicality as an afterthought — something enforced by a runtime check in the decoder rather than by the structure of the encoding itself.
Since it’s the most common varint, we’re going to pick on LEB128 a bit here. I want to emphasize how much LEB128 is a great choice for many projects, and the reasons that it was not a good choice for us also applies to the other formats that we looked at. It just happened to not be a perfect fit for our use case.
This is where the “128” in “LEB128” comes from, since 2⁷ = 128
LEB128 encodes a number as a sequence of 7-bit segments, with the high bit of each byte signaling “more bytes follow” (there can be many such continuation segments, though we only show 2 segments below). This lets you avoid always writing 8 bytes (64 bits) even when you’re representing a small number (which would be mostly zeroes). This is like writing 5 instead of 000000005 just to get the correct number of characters. Putting aside that working in 7-bit is odd, this is a practical solution!
LEB128 layout
But there’s a problem: the number 0 can be encoded as the single byte 0x00 , but it can also be encoded as 0x80 0x00 . Or 0x80 0x80 0x00 . Or any longer sequence of 0x80 s ending in a zero byte. 0x80 is 1 0000000 , so you can have as many of those as you want and still get 0 ! Most LEB128 decoders will happily accept any of them. This isn’t unique to zero; nearly every number in LEB128 can be represented many ways.
... continue reading