Haydex: From Zero to 178.6B rows a second in 30 days

I/O architecture determines scale : One large read instead of thousands of small reads changed everything

: One large read instead of thousands of small reads changed everything Profiler-driven optimization : 90% of allocations and 70% of CPU were hiding in unexpected places

: 90% of allocations and 70% of CPU were hiding in unexpected places Distributed redesign unlocks speed : Map-reduce Lambda architecture delivered 6x indexing speedup

: Map-reduce Lambda architecture delivered 6x indexing speedup Compound optimizations multiply : Each optimization amplified others to reach 673 billion rows/second

: Each optimization amplified others to reach 673 billion rows/second Production beats theory: V0's elegant design failed; V1 succeeded by respecting network physics

Nearly every great engineering story starts not with a grand plan, but with a nagging, infuriating problem.

Ours was simple: our needle-in-the-haystack queries were too slow. For a database company, that's an existential threat. Our customers, especially giants like Hyperscale Customer, were pushing data at a scale that made our brute-force scanning approach look like trying to find a specific grain of sand on a planet-sized beach with a teaspoon. We had to do something drastic.

This is the story of that something. It's the story of a project that had been tried before and shelved, a project that rose from the dead.

In a single, caffeine-fueled month between June 9 and July 8, 2025, we took Haydex, our dream of a hyper-fast filtering system, and forged it into a production-hardened reality.

It was a journey into the abyss of distributed systems, a battle against memory bottlenecks, API limits, and our own assumptions.

... continue reading