In this post, we’ll dig deep into how TileIR works, from how it generates instructions to analyzing its different passes. We’ll trace how a Mixture-of-Experts (MoE) kernel written in CuTile gets compiled down through cuda_tile → nv_tileaa → nv_tileas → NVVM → LLVM → SASS.
Here’s what to expect:
What is CuTile? — The tile-centric programming model
— The tile-centric programming model Running Example — An MoE kernel we’ll trace through every stage
— An MoE kernel we’ll trace through every stage The Dialects — From cuda_tile through nv_tileaa and nv_tileas to NVVM/LLVM
— From through and to NVVM/LLVM The Passes — TileIR passes: what they do and when they run
Based on CUDA 13.1. Some details are undocumented and may change in future releases.
What is CuTile?
CuTile is NVIDIA’s new “tile-centric” programming model for modern NVIDIA GPUs. This abstraction is powerful: CuTile lets the programmer think in terms of tiles rather than threads, while the compiler handles the complexity of coordinating hundreds of threads across fragmented data. A single CuTile line ct.mma(a, b, acc) could get transformed to many tensor core instructions.
What is TileIR?
... continue reading