The pitfalls of partitioning Postgres yourself

(not a hatchet job on Postgres partitioning - we do actually like it)

Alexander Belanger Co-Founder · Hatchet

If you dive deep into Hatchet's codebase these days, you might get the impression that we don't know what we're doing. In particular, for a seasoned Postgres user, this method might stand out to you:

Loading syntax highlighting...

This method goes through a handful of our Postgres tables and runs ANALYZE ; , seemingly at random.

Yes, we do have autovacuum enabled on our databases (and we expect users who are self-hosting to have it enabled as well), which means that tables are autoanalyzed after they get autovacuumed. Why are we manually running analyze on our tables all the time?

While there are a bunch of ways in which we don't know what we're doing, this particular bit of code comes from a hard-won set of production incidents, and some nuances of Postgres's autoanalyze implementation.

For some background, Hatchet is a durable queue, built on top of Postgres, which offers useful features for running complex background jobs, like workflows and durable execution. As a result, we store up to hundreds of millions of tasks per day inside of a set of Postgres tables. In the first version of Hatchet, we used to assign each task a UUID and store runs in a single table. This worked well, until we hit ~200 million rows and started to see our indexes and tables bloat.

Even worse was deleting old tasks to keep storage costs down; we had to run a chunked deletion process almost 24/7, and because of various foreign keys and constraints, this wasn't always fast enough to keep up with the ever-growing size of the Postgres table.

So we started looking into time-based partitioning of our tables. For those unaware, Postgres has built-in support for partitioning, including range-based and hash-based partitioning.

... continue reading