Tech News
← Back to articles

A Tale of Four Fuzzers

read original related products more articles

Charles Darnay observed that the gate was held by a mixed guard of soldiers and patriots, the latter far outnumbering the former; and that while ingress into the city for peasants’ carts bringing in supplies, and for similar traffic and traffickers, was easy enough, egress, even for the homeliest people, was very difficult.

— Complaining about egress fees goes back to at least the French Revolution.

Some time ago we overhauled TigerBeetle’s routing algorithm to better handle varying network topologies in a cluster. That turned out to be an interesting case study of practical generative testing (or fuzzing) for non-trivial, real-world code. We ended up adding not one, not even two, but four very different new fuzzers to the system! Let’s talk about why just one fuzzer is not enough.

This is a good moment to brew some tea, the journey will take us awhile!

Although this post isn’t primarily about the new algorithm itself, we’ll start by covering the basics of replication. TigerBeetle provides transaction Atomicity, Consistency, Isolation and Durability (ACID). Out of the four letters, the D, Durability, is the most consequential. For, without Durability, there wouldn’t be any data at all to provide guarantees for!

You can get a decent chunk of durability by writing the data to a (single) hard drive. This works for many non-critical applications, but might still fail if you repeat the procedure often enough. Disks are faulty with non-zero probability, and it is fairly common to lose an entire machine (floods, fires and tripping over the power supply happen). If you really want your data to be durable, better to store several copies of it on different machines, to replicate.

All data in TigerBeetle is ultimately derived from an append-only hash-chained log of prepare messages, so the task of replication reduces to distributing the prepares (a MiB each) across the six replicas of the cluster.

The primary sends .prepare messages down to the backups; they reply .prepare_ok back up once the prepare is locally durable. When the primary receives a quorum of .prepare_ok s, it knows that the message is globally durable.

The most straightforward way to implement that is for the primary to broadcast the prepare:

Star Topology

... continue reading