Branimir Lambov from IBM on Cassandra

You are getting early access to this article as a subscriber. Your support makes articles like this possible. Thank you.

This article is part of a series interviewing developers (not founders, not executives) working on software infrastructure to understand their work, how they got here, the projects they’re proud of, the incidents they’ve learned from, and what they’re curious about.

Branimir (LinkedIn; GitHub) is a Cassandra committer, and member of the Project Management Committee (PMC), currently working at IBM (via its acquisition of DataStax). With a PhD (University of Aarhus, 2005) on exact computation of real numbers, and years spent working on digital signals processing, natural language processing, and the last 11 years as a Cassandra committer at IBM, Branimir has an interesting background. One big recent project of his, released in Cassandra 5 (2024), allows users to swap out the Skiplist in the Log-Structured Merge Tree (LSM Tree) for a Trie, improving memory usage and storage efficiency for anyone who opts in.

Starting from the basics... how did you end up working on Cassandra? Were you into databases before DataStax?#

Not at all. In fact, I remember considering them boring in university... Then a recruiter contacted me at the right time and I spoke to people who made me think of concrete problems; this got me interested, landed me in good company and a good company to work for, and I've kept doing it for over a decade now.

Supporting the Trie in the LSM Tree was a big recent project. Can you tell us about some other big projects you've worked on in Cassandra?#

There have been a few. The first one was the deterministic token allocation, where we wanted to find a way to reduce the number of virtual nodes by selecting them algorithmically rather than randomly. I spent a rather long period of time crafting a solution and had it committed. It took a couple of years until it was used in earnest, and turned out to not be that suitable for the way people actually used Cassandra. But it did include a smaller piece, written for tables that don't use replication, that worked great once it was applied for each individual rack, which ended up being both a simpler and better solution.

More recently, I worked with a team of people on modernizing Cassandra's compaction strategies, which resulted in Cassandra 5's Unified Compaction Strategy. We started by looking at the results from academia and compared them to our legacy compaction strategies to figure out how to build a more flexible solution that can cover them. After implementing this, we used it in DataStax's private branch to successfully handle densities an order of magnitude higher than what was typical before. We did not stop there, as the version that was ultimately merged in Cassandra 5 includes further improvements that reduce the need for manual configuration and make it even harder to topple.

As for the Trie project itself, how does a project like that happen end-to-end?#

This has been a long-running project for me, and is still ongoing as we speak. It started almost a decade ago, when I was a relatively fresh contributor to the project, and rather mundanely: we knew there was something to be gained if we took advantage of byte order, i.e. if we used keys that can be compared lexicographically like strings. We did not know what. Some of the more senior contributors at the time asked me to try it, as I had experience working with strings before in the context of natural languages.

... continue reading