Skip to content
Tech News
← Back to articles

I rewrote PostHog's SQL parser, 70x faster, while barely looking at the code

read original more articles
Why This Matters

This development highlights how leveraging AI tools like Claude can significantly accelerate complex tasks such as rewriting parsers, leading to substantial performance improvements. For the tech industry and consumers, faster query processing enhances data analysis capabilities and user experience, especially in analytics platforms like PostHog. It also demonstrates the potential of AI-assisted development to optimize core infrastructure components efficiently.

Key Takeaways

After the success of using agents to improve query performance through autoresearch, I wanted to try something more ambitious.

I rewrote PostHog's SQL parser using multiple long-running Claude Code sessions in parallel. The result was 16K lines of "hand"-rolled parser code, 5K lines of tooling, a few more K of tests, and a ~70x speed up.

The new parser is equivalent to the previous one for all realistic queries, only differing for a tiny subset of queries written by an evil trickster deity (there’s a test for SELECT SELECT FROM FROM WHERE WHERE AND AND which is completely valid SQL).

Here's how I did it and what I learned along the way.

Why does PostHog even have an SQL parser?

PostHog lets you access your data directly with SQL. We transpile your SQL to raw ClickHouse SQL because:

We want to present a logical view of your data which is independent of the physical layout in the database.

This lets us change things at the database layer without breaking existing queries.

We can also add a bunch of performance optimizations and access controls.

The majority of PostHog tools (e.g. product analytics, session replay, error tracking) have queries written in SQL and they go through the exact same transpilation process. But before we can do this transpilation, we need to use a parser to turn the SQL into an AST (Abstract Syntax Tree) that then gets transpiled into ClickHouse SQL.

... continue reading