"The G in GPU is for Graphics damnit "

“The G in GPU is for Graphics damnit!”: Adventures in Triton Kernels, Profiling, Parallelism and More

Background

As is true for everything, a lot of things need to happen for anything to happen, and so it’s true for this blogpost as well. Out of all of these everything that needed to happen, 3 are these:

The first professor I worked with was a graphics researcher who in our first meeting started out with a short rant about how his GPUs are being hogged for ML workloads, how students don’t approach him for graphics research anymore and that “The G in GPU is for Graphics”. I then had to tell him that I also approached him for an ML project. The happy compromise between our interests was to work on NeRFs. A fun piece of trivia is that the vision and graphics lab I worked in for this project is funded by the only BITSian to recieve an Oscars award.

In my first year, I developed a short-term obsession and a long-term admiration for “code art” and found some beautiful work like these: fronkonstin, Sage Jenson, MoMA’s Code and Art exhibition, Casey Reas, Zachary Lieberman, etc. I wish I had archived these better because I am sure I am forgetting some very good ones. Also check out Processing and TouchDesigner if this interests you. I tried my hand at it for a bit, but eventually got busy with the more important efforts of loafing around in college. These can found on my github.

In 2017, Phillipe Tillet, a PhD student at Harvard, started working on a DSL for CUDA programming called Triton which is then publically released in 2019. In 2020, OpenAI hires Phillipe and starts developing and maintaining Triton announcing a more usable prototype in 2021. In August of this year, I joined Microsoft Research India for a semester for my Bachelor’s thesis, where I have been working on (and mounting the accompanying learning curve of) systems and GPU optimisations. A lot of what makes this blog is what I have learnt in the previous ~2 months.

Note. This blog is not a tutorial because better content exists elsewhere, some of which I have linked where appropriate. For reproducibility: all experiments are run on NVIDIA RTX A6000 GPUs (courtesy MSR, thank you Mr. Nadella), using Triton 3.4.0 and Torch 2.8.0.

The Model

Physarum, as I understand, is a type of mold/fungi (which I go long back with) who’s growth can be modelled largely accurately with 2 superimposed grids: the agent and pheromone fields, wherein the agents move in the direction of the highest concentration of pheromones, and leave a pheromone trail as they move. This is better explained in this blog by Anotonio, and this one by Sage Jenson. A more detailed explanation is in this paper by Jeff Jones.

For brevity, I will borrow Antonio and Sage’s explanations of the design:

... continue reading