Is Postgres read heavy or write heavy?

When someone asks about Postgres tuning, I always say “it depends”. What “it” is can vary widely but one major factor is the read and write traffic of a Postgres database. Today let’s dig into knowing if your Postgres database is read heavy or write heavy.

Of course write heavy or read heavy can largely be inferred from your business logic. Social media app - read heavy. IoT logger - write heavy. But …. Many of us have mixed use applications. Knowing your write and read load can help you make other decisions about tuning and architecture priorities with your Postgres fleet.

Understanding whether a Postgres database is read-heavy or write-heavy is paramount for effective database administration and performance tuning. For example, a read-heavy database might benefit more from extensive indexing, query caching, and read replicas, while a write-heavy database might require optimizations like faster storage, efficient WAL (Write-Ahead Log) management, table design considerations (such as fill factor and autovacuum tuning) and careful consideration of transaction isolation levels.

By reviewing a detailed read/write estimation, you can gain valuable insights into the underlying workload characteristics, enabling informed decisions for optimizing resource allocation and improving overall database performance.

The challenge here in looking at Postgres like this is that reads and writes are not really equal.

Postgres reads data in whole 8kb units, called blocks on disk or pages once they’re part of the shared memory. The cost of reading is much lower than writing. Since the most frequently used data generally resides in the shared buffers or the OS cache, many queries never need additional physical IO and can return results just from memory.

Postgres writes by comparison are a little more complicated. When changing an individual tuple, Postgres needs to write data to WAL defining what happens. If this is the first write after a checkpoint, this could include a copy of the full data page. This also can involve writing additional data for any index changes, toast table changes, or toast table indexes. This is the direct write cost of a single database change, which is done before the commit is accepted. There is also the IO cost for writing out all dirty page buffers, but this is generally done in the background by the background writer. In addition to these write IO costs, the data pages need to be in memory in order to make changes, so every write operation also has potential read overhead as well.

That being said - I’ve worked on a query using internal table statistics that loosely estimates read load and write load.

This query leverages Postgres’ internal metadata to provide an estimation of the number of disk pages (or blocks) that have been directly affected by changes to a given number of tuples (rows). This estimation is crucial for understanding the read/write profile of a database, which in turn can inform optimization strategies (see below).

The query's logic is broken down into several Common Table Expressions (CTEs) to enhance readability and modularity:

... continue reading