A Catalog of Side Effects

Optimizing compilers like to keep track of each IR instruction’s effects. An instruction’s effects vary wildly from having no effects at all, to writing a specific variable, to completely unknown (writing all state).

This post can be thought of as a continuation of What I talk about when I talk about IRs, specifically the section talking about asking the right questions. When we talk about effects, we should ask the right questions: not what opcode is this? but instead what effects does this opcode have?

Different compilers represent and track these effects differently. I’ve been thinking about how to represent these effects all year, so I have been doing some reading. In this post I will give some summaries of the landscape of approaches. Please feel free to suggest more.

Some background

Internal IR effect tracking is similar to the programming language notion of algebraic effects in type systems, but internally, compilers keep track of finer-grained effects. Effects such as “writes to a local variable”, “writes to a list”, or “reads from the stack” indicate what instructions can be re-ordered, duplicated, or removed entirely.

For example, consider the following pseodocode for some made-up language that stands in for a snippet of compiler IR:

# ... v = some_var [ 0 ] another_var [ 0 ] = 5 # ...

The goal of effects is to communicate to the compiler if, for example, these two IR instructions can be re-ordered. The second instruction might write to a location that the first one reads. But it also might not! This is about knowing if some_var and another_var alias—if they are different names that refer to the same object.

We can sometimes answer that question directly, but often it’s cheaper to compute an approximate answer: could they even alias? It’s possible that some_var and another_var have different types, meaning that (as long as you have strict aliasing) the Load and Store operations that implement these reads and writes by definition touch different locations. And if they look at disjoint locations, there need not be any explicit order enforced.

Different compilers keep track of this information differently. The null effect analysis gives up and says “every instruction is maximally effectful” and therefore “we can’t re-order or delete any instructions”. That’s probably fine for a first stab at a compiler, where you will get a big speed up purely based on strength reductions. Over-approximations of effects should always be valid.

... continue reading