Skip to content
Tech News
← Back to articles

Kovan: From Production MVCC Systems to Wait-Free Memory Reclamation

read original more articles
Why This Matters

This article highlights the challenges of scaling lock-free and wait-free memory management techniques in high-performance, production-grade systems. It underscores the importance of choosing appropriate concurrency control mechanisms to prevent issues like unbounded memory growth and stalled threads, which can impact both system stability and performance for consumers and developers alike.

Key Takeaways

Kovan: From Production MVCC Systems to Wait-Free Memory Reclamation 2/18/2026 / 14 minutes to read / Tags: concurrency control, memory management, MVCC, wait-free

Six years ago I started building Lever, a transactional in-memory database toolkit. It needed to handle millions of operations per second with MVCC semantics, STM, and wait-free primitives, so I had to get the concurrency model right from day one.

Lever has been running in production, processing over 25 million operations in under 2 seconds. On top of it I built Callysto (stream processing & service framework) which a few companies have been running in production. The systems worked. Ok,, I can say that, any problems that I will describe here, didn’t happen because of the scale was low at that time.

But operating at a massive scale for long enough, you stop running into bugs and start running into the assumptions baked into your tools.

The 3am Problem

Here’s what nobody tells you about lock-free data structures: they’re amazing until they’re not.

Most Rust developers reach for crossbeam-epoch for memory reclamation. Ok, that’s also a lie, not so many Rust developers use lock-free data structures in production. But if you do, you’ll eventually run into the same problem. If this is your first post about lock-free data structures, you might be wondering what’s the big deal? I won’t answer that, but I’ll tell you what’s the big deal with lock-free data structures. Coming back to crossbeam. It’s genuinely good engineering… Fast, well-tested, and the obvious default. But it’s lock-free, not wait-free. That distinction is easy to dismiss until you’re looking at a heap that’s grown to 32GB overnight and you’re trying to explain to someone why a single stalled thread can block memory reclamation across the entire process.

If you know what happened in your production, most probably at this point you are blaming yourself and smearing your face with lifetimes to decrease the memory allocation. If you have done this, now you are learning something, that is… It is not your fault, it is the fault of the dependency you are using. I mean, it is not like you can do anything about it. Btw, don’t use lifetimes as lifeboat.

Enter Shikari (Wait that was a band name?)

Let’s dive in and hunt this! Look closely at the diagram below. It shows how lock-free memory reclamation works in crossbeam.

... continue reading