CF Bolz-Tereick wrote some excellent posts in which they introduce a small IR and optimizer and extend it with allocation removal. We also did a live stream together in which we did some more heap optimizations.
In this blog post, I’m going to write a small abtract interpreter for the Toy IR and then show how we can use it to do some simple optimizations. It assumes that you are familiar with the little IR, which I have reproduced unchanged in a GitHub Gist.
Abstract interpretation is a general framework for efficiently computing properties that must be true for all possible executions of a program. It’s a widely used approach both in compiler optimizations as well as offline static analysis for finding bugs. I’m writing this post to pave the way for CF’s next post on proving abstract interpreters correct for range analysis and known bits analysis inside PyPy.
Before we begin, I want to note a couple of things:
The Toy IR is in SSA form, which means that every variable is defined exactly once. This means that abstract properties of each variable are easy to track.
The Toy IR represents a linear trace without control flow, meaning we won’t talk about meet/join or fixpoints. They only make sense if the IR has a notion of conditional branches or back edges (loops).
Alright, let’s get started.
Welcome to abstract interpretation
Abstract interpretation means a couple different things to different people. There’s rigorous mathematical formalism thanks to Patrick and Radhia Cousot, our favorite power couple, and there’s also sketchy hand-wavy stuff like what will follow in this post. In the end, all people are trying to do is reason about program behavior without running it.
In particular, abstract interpretation is an over-approximation of the behavior of a program. Correctly implemented abstract interpreters never lie, but they might be a little bit pessimistic. This is because instead of using real values and running the program—which would produce a concrete result and some real-world behavior—we “run” the program with a parallel universe of abstract values. This abstract run gives us information about all possible runs of the program.
... continue reading