jank now has its own custom IR
Good news, everyone! jank has a new custom intermediate representation (IR) and we're using it to optimize jank to compete with the JVM. We'll dive into more of that today, but first I want to say thank you to my Github sponsors and to Clojurists Together for sponsoring me this whole year. You all are helping a great deal. I am still searching for a way to continue working on jank full-time with an income which will cover rent and groceries, so if you've not yet chipped in a sponsorship, now's a great time!
What is an intermediate representation (IR)?
Compilers often represent programs as a more abstract set of instructions than a target CPU instruction set can afford. This has a few added benefits. Firstly, the program can be represented in a way which could later be lowered to different CPU architectures, such as x86_64 or arm64. Since intermediate representations are often higher level than CPU architectures, they can generally be more portable. Secondly, IRs can be specifically designed to represent the program in a way which makes writing certain optimizations easier, such as single static assigment (SSA) form. Finally, IR designers get to choose the level of abstraction of the IR to match the semantics they're aiming to represent, which can either make an IR more general or more specific to a certain language.
There are many common popular IRs, such as the JVM's bytecode, the CLR's common intermediate language (CIL), GCC's GIMPLE, LLVM's IR, and so on. Some compilers may move the program through multiple IRs during compilation.
Custom IR rationale
Historically, jank has not been an optimizing compiler. We've delegated basically all of that work to LLVM, based on the C++ or LLVM IR which we would generate. However, LLVM IR works at a very low level, compared to Clojure. It has no concept of Clojure's vars, transients, persistent data structures, lazy sequences, and so on. Clojure's dynamism is granted by a great deal of both polymorphism and indirection, but this means LLVM has very few optimization opportunities when it's dealing with the LLVM IR from jank.
The optimization work done previously on jank helped optimize its runtime, and the compiler itself, but less so the code being compiled by the compiler. In the past two months, I have sought to change this.
I wanted an IR which operated at the level of Clojure's semantics. This would be much higher level than LLVM IR and even much higher level than JVM's bytecode. Since I'm not building a general virtual machine (VM) or compiler platform, I don't need to generalize the IR for different languages. I can make jank's IR specifically tailored to jank, which gives us even more power for optimizations. As far as I know, no Clojure dialects have taken this step.
Custom IR details
... continue reading