Pushing and Pulling: Three reactivity algorithms

It’s looking like I’m going to need to build a reactive engine for work, so I’m going to prepare for that by writing down what I know about them. I want to look at three ways of building reactive engines: push reactivity, pull reactivity, and the hybrid push/pull combination that is used in a bunch of web frameworks.

The simplest way to visualise reactivity, in my opinion, is as a spreadsheet. You have a series of input cells, containing your initial data, and a series of output cells containing the final results. In between are many more cells containing intermediate computations that need to be done to figure out the final results.

When one of the input cells changes, all the cells that depend on it need to react to that change — the aforementioned reactivity. In practice, though, this is the bare minimum requirement. When building reactive systems, there are usually some additional requirements we impose that make the problem harder:

Each cell is recalculated at most once. We don’t do any calculations that are immediately discarded. (“Efficient”) We only update cells that actually need to be updated. Any cells that aren’t affected by the input are left untouched. (“Fine-grained”) Whenever cells change, all the cells change at the same time. There’s never a moment when an intermediate computed value has updated, but the output cell is still showing a result based on the previous input. (“Glitchless”) Each time a node is updated, it may dynamically add or remove dependencies (“Dynamic”)

The first two requirements (“Efficient” and “Fine-grained”) are important for performance. Imagine a large spreadsheet with millions of cells containing formulas — it would be a huge waste of resources to recalculate every single cell, every time any input changes. Similarly, you don’t want to calculate the value of a cell multiple times if you can help it. In general, we want to do the minimum amount of work possible.

The third requirement (“Glitchless”) is important for correctness. We don’t want intermediate states to be observable — if this were possible, then we can end up with invalid states. Consider two neighbouring cells, one that contains a country’s ISO country code (UK, DE, BE, etc), and another that contains the full name of that country. We don’t want to be able to observe the state where the two cells are out-of-sync with each other.

The fourth requirement (“Dynamic”) ensures that we only create dependencies when they are actually needed. This is easiest to see with conditional formulas, so something like IF(<condition>, slow_calculation(B1)) . If the condition is true, this formula returns the value of (and therefore depends on) the cell B1. But if the condition is false, the formula returns nothing — and if B1 changes, this cell should not be updated. This is a dynamic dependency — the dependency only exists if <condition> is true.

These requirements will hopefully become more clear as we start trying out different algorithms, and seeing examples of their successes and failure modes. Before we get too deep in the weeds, though, I want to emphasise that not all reactive systems are the same, and some don’t need all of these requirements. For example, lots of simple reactive systems work just fine with static dependencies only, trading off some efficiency wins for implementation simplicity. Similarly, glitches are only important if they are actually observed, so some reactive systems will be glitch-y by default, but provide tools for syncing nodes together if the user actually needs them to be in sync.

But for the sake of this article, let’s assume we need all these things and look at some approaches to implementing reactivity.

In push-based reactivity, when a node updates, it notifies (and updates) all of its dependents. We can visualise this as the update being pushed down the chain of dependencies, until it reaches the final node to be updated.

... continue reading