After the success of using agents to improve query performance through autoresearch, I wanted to try something more ambitious.
I rewrote PostHog's SQL parser using multiple long-running Claude Code sessions in parallel. The result was 16K lines of "hand"-rolled parser code, 5K lines of tooling, a few more K of tests, and a ~70x speed up.
The new parser is equivalent to the previous one for all realistic queries, only differing for a tiny subset of queries written by an evil trickster deity (there’s a test for SELECT SELECT FROM FROM WHERE WHERE AND AND which is completely valid SQL).
Here's how I did it and what I learned along the way.
Why does PostHog even have an SQL parser?
PostHog lets you access your data directly with SQL. We transpile your SQL to raw ClickHouse SQL because:
We want to present a logical view of your data which is independent of the physical layout in the database.
This lets us change things at the database layer without breaking existing queries.
We can also add a bunch of performance optimizations and access controls.
The majority of PostHog tools (e.g. product analytics, session replay, error tracking) have queries written in SQL and they go through the exact same transpilation process. But before we can do this transpilation, we need to use a parser to turn the SQL into an AST (Abstract Syntax Tree) that then gets transpiled into ClickHouse SQL.
... continue reading