Tech News
← Back to articles

TileIR Internals

read original related products more articles

In this post, we’ll dig deep into how TileIR works, from how it generates instructions to analyzing its different passes. We’ll trace how a Mixture-of-Experts (MoE) kernel written in CuTile gets compiled down through cuda_tile → nv_tileaa → nv_tileas → NVVM → LLVM → SASS.

Here’s what to expect:

What is CuTile? — The tile-centric programming model

— The tile-centric programming model Running Example — An MoE kernel we’ll trace through every stage

— An MoE kernel we’ll trace through every stage The Dialects — From cuda_tile through nv_tileaa and nv_tileas to NVVM/LLVM

— From through and to NVVM/LLVM The Passes — TileIR passes: what they do and when they run

Based on CUDA 13.1. Some details are undocumented and may change in future releases.

What is CuTile?

CuTile is NVIDIA’s new “tile-centric” programming model for modern NVIDIA GPUs. This abstraction is powerful: CuTile lets the programmer think in terms of tiles rather than threads, while the compiler handles the complexity of coordinating hundreds of threads across fragmented data. A single CuTile line ct.mma(a, b, acc) could get transformed to many tensor core instructions.

What is TileIR?

... continue reading