Structuring large Clojure codebases with Biff

Jacob O'Bryant | 28 Jan 2025

I've been making some progress on rewriting Yakread (a fancy reading app) from ~scratch and open-sourcing it in the process. Along the way I'm experimenting with potential new features for Biff, my Clojure web framework, which Yakread is built with. In particular I'm working on approaches for keeping Biff apps more manageable as the codebase grows: the original Yakread codebase was about 10k lines and was already getting pretty crufty. I've also learned some things from contributing to our ~85k-line Clojure codebase at work.

I thought it'd be worth going over the main new architectural approaches in Yakread for anyone interested in poking around the code/as a preview of what to expect in Biff later on. The open-source repo has only a sliver of the production app's functionality so far, but it has examples of all the approaches described below.

Materialized views

"Old Yakread" has a lot of slow queries. For example, loading the subscriptions page on my account takes more than 10 seconds: for each of my hundreds of subscriptions, it has to run a query to figure out how many unread posts there are and when the most recent post was published. This is currently done the dumb way, i.e. Yakread queries for every single post and then computes the aggregate data.

The traditional way to solve this would be to denormalize the data model (add fields for “# unread items” and “last published at” to the subscription model) and keep it up to date manually (update those fields whenever a new post is published, whenever the user reads a post, etc). However, this approach can get out of hand.

I’ve addressed this in a cleaner way by implementing materialized views for XTDB. I store them in a dedicated RocksDB instance. For each piece of denormalized data you need, you define a pure "denormalizer" function* which takes in an item from XTDB’s transaction log along with the current RocksDB state and returns a map of key-value pairs that will be written back to RocksDB. Biff handles everything else: setting up RocksDB, running your denormalizer functions whenever there’s a new XTDB transaction, and providing a RocksDB snapshot for querying that’s consistent with your current XTDB snapshot (we retain XTDB's database-as-a-value semantics).

*Still deciding on the name... the codebase calls them "indexer" functions currently, but I decided "materialized views" are a clearer/more accurate term than "indexes."

This is a lower-level approach than something like Materialize, which lets you write regular SQL queries instead of defining these denormalizer functions (i.e. you’re defining a function of current DB state -> materialized view instead of new transaction, current materialized view -> new materialized view ). However, when I experimented with Materialize several years ago I found that its memory overhead made it untenable for my use case. I’m sure it’s much better for, say, aggregating metrics from large real-time systems, even if it sadly didn’t work out for the simplify-random-guy’s-RSS-reader use case. (I’d also like to look into other things in this space like Rama and Feldera).

Writing the incremental view maintenance logic by hand is somewhat tedious, but the testing approach I'm using makes it really not bad. I’ve written code with Fugato that can take the database schema for your app and generate test data for use with test.check (Clojure’s property-based testing library). All you have to do is write an “oracle” function that takes a database snapshot and computes what the materialized view should look like for that snapshot. e.g. for the “subscription last published at” materialized view, the oracle function simply gets all the posts for each subscription and then finds the one with the latest published-at date. Then the testing code ensures that the materialized view computed by your denormalizer function matches.

... continue reading