Type-based alias analysis in the Toy Optimizer

Another entry in the Toy Optimizer series.

Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time!

We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn’t know if object a and b aliased, we could at least know that different offsets would never alias (assuming our objects don’t overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic.

Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We’ll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post.

Representing types

We return once again to Fil Pizlo land, specifically How I implement SSA form. We’re going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already.

This representation divides the heap into disjoint regions by type. Consider, for example, that Array objects and String objects do not overlap. A LinkedList pointer is never going to alias an Integer pointer. They can therefore be reasoned about separately.

But sometimes you don’t have perfect type information available. If you have in your language an Object base class of all objects, then the Object heap overlaps with, say, the Array heap. So you need some way to represent that too—just having an enum doesn’t work cleanly.

Here is an example simplified type hierarchy:

Any Object Array String Other

... continue reading