Index, Count, Offset, Size

Wherein we make progress towards solving one of the most vexing problems of Computer Science — naming things.

I am at a point in my career where the bulk of my bugs are stupid — I simply fail to type in the code I have in my mind correctly. In languages with shadowing (like Rust), I will fail to use a shadowed variable from the outer scope. In languages without shadowing (like Zig), I will use the wrong version of a variable.

Pests like these are annoying, so I am always on the lookout for tricks to minimize the probability of bugs. One of the best possible tricks is of course strong static typing. Types are good at preventing me from doing stupid things by accident. But types have limitations. The text of a well-typed program is a two-in-one artifact — a specification of behavior of a machine (the algorithm), and a proof that the behavior is not unacceptable. Zero cost abstractions are code without behavior, just proofs!

The art of skillful typing lies in minimizing verbosity of the proof, while maximizing the amount of unwanted behaviors ruled out, weighted by the probability and the cost of misbehavior. But this ratio is not always favorable — the code can be so proof-heavy that it becomes impossible to understand what it actually does!

There’s one particular cranny where types don’t seem to usefully penetrate yet: indexing and associated off-by-one errors.

If you don’t need indexing arithmetic, you can use newtype pattern to prevent accessing oranges with apple-derived indexes. You can even go further and bind indexes to specific arrays, using, e.g., Gankra trick, but I haven’t seen that to be useful in practice.

If, however, you compute indexes, you need to be extra careful to stay in bounds of an array, and need to be mindful that the maximum valid index is one less than the length of the array. While we don’t solve this problem perfectly at TigerBeetle, I think we have a naming convention that helps:

Thanks @marler8997 for the illustration idea!

We consistently use count whenever we talk about the number of items, and index to point to a particular item. The positive invariant is index < count . Consistency is the trick — there are certain valid and invalid ways to combine indexes and counts in an expression, and, if there’s always an _index or a _count suffix in the name, wrong combinations immediately jump out at you, dear reader, even if you don’t understand the specifics of the code.

In low-level code you often need to switch between a well-typed representation []T and raw bytes []u8 . To not confuse the two index spaces, the “count of bytes” is always called a size . By definition,

... continue reading