I have a lot of thoughts about the design of compiler intermediate representations (IRs). In this post I’m going to try and communicate some of those ideas and why I think they are important.
The overarching idea is being able to make decisions with only local information.
That comes in a couple of different flavors. We’ll assume that we’re compiling a method at a time, instead of a something more trace-like (tracing, tracelets, basic block versioning, etc).
Control-flow graphs
A function will normally have some control-flow: if , while , for , any amount of jumping around within a function. Let’s look at an example function in a language with advanced control-flow constructs:
int sumto ( int n ) { int result = 0 ; for ( result = 0 ; n >= 0 ; n -= 1 ) { result += n ; } return result ; }
Most compilers will deconstruct this for , with its many nested expressions, into simple comparisons and jumps. In order to resolve jump targets in your compiler, you may have some notion of labels (in this case, words ending with a colon):
int sumto(int n) { entry: int result = 0 header: if n < 0, jumpto end result = result + n n = n - 1 jumpto header end: return result }
This looks kind of like a pseudo-assembly language. It has its high-level language features decomposed into many smaller instructions. It also has implicit fallthrough between labeled sections (for example, entry into header ).
I mention these things because they, like the rest of the ideas in this post, are points in an IR design space. Representing code this way is an explicit choice, not a given. For example, one could make the jumps explicit by adding a jumpto header at the end of entry .
... continue reading