RegreSQL: Regression Testing for PostgreSQL Queries

TL;DR - RegreSQL brings PostgreSQL's regression testing methodology to your application queries, catching both correctness bugs and performance regressions before production.

As puzzling as it might seem, the common problem with production changes is the ever-present "AHA" moment when things start slowing down or crashing straight away. Testing isn't easy as it is, but there's a widespread practice gap when it comes to testing SQL queries. Some might pretend to "fix it" by using ORMs to abstract away the problem. Others treat SQL as "just glue code" that doesn't deserve systematic testing. Most settle for integration tests that verify the application layer works, never actually testing whether their queries will survive the next schema change or index modification.

For PostgreSQL development itself, the project has a robust regression test suite that has been preventing disasters in core development for decades. The database itself knows how to test SQL systematically - we just don't use those same techniques for our own queries. Enter RegreSQL, a tool originally created by Dimitri Fontaine for The Art of PostgreSQL book (which is excellent for understanding and mastering PostgreSQL as a database system), designed to bring the same regression testing framework to our application queries.

I've been trying to use it for some time, but due to missing features and limitations gave up several times. Until now. I decided to fork the project and spend the time needed to take it to the next level.

The RegreSQL promise starts with the biggest strength and perceived weakness of SQL queries. They are just strings. And unless you use something like sqlc (for Go), PG'OCaml or Rust's SQLx toolkit giving you compile-time checking, your queries are validated only when they are executed. Which in better case mean either usually slow-ish test suite or integration tests, in worst scenario only when deployed. ORMs are another possibility - completely abstracting away SQL (but more on that later).

But even with compile-time checking, you are only checking for one class of problems: schema mismatches. What about behavior changes after schema migration or performance regressions? What about understanding whether your optimization actually made things faster or just moved the problem elsewhere?

This is where RegreSQL comes in. Rather than trying to turn SQL into something else, RegreSQL embraces "SQL as strings" reality and applies the same testing methodology PostgreSQL itself uses: regression testing. You write (or generate - continue reading) your SQL queries, provide input data, and RegreSQL verifies that future changes don't break those expectations.

The features don't stop there though - it tracks performance baselines, detects common query plan regressions (like sequential scans), and gives you framework for systematic experimentation with the schema changes and query change management.

Basic regression testing#

Enough with theory. Let's jump in straight into the action and see what a sample run of RegreSQL looks like

... continue reading