Tech News
← Back to articles

This Year in LLVM (2025)

read original related products more articles

It’s 2026, so it’s time for my yearly summary blog post. I’m a bit late, but at least it’s still January! As usual, this summary is about my own work, and only covers the more significant / higher-level items.

Previous years: 2024, 2023, 2022

ptradd

I have been making slow progress on the ptradd migration over the last three years. The goal of this change is to move away from the type-based getelementptr (GEP) representation, towards a ptradd instruction, which just adds an integer offset to a pointer.

The state at the start of the year was that constant-offset GEP instructions were canonicalized to the form getelementptr i8, ptr %p, i64 OFFSET , which is equivalent to a ptradd .

The progress this year was to canonicalize all GEP instructions to have a single offset. For example, getelementptr [10 x i32], ptr %p, i64 %a, i64 %b gets split into two instructions now. This moves us closer to ptradd , which only accepts a single offset argument. However, the change is also independently useful, because it allows CSE of common GEP prefixes.

This work happened in multiple phases, first splitting multiple variable indices, then splitting off constant indices as well and finally removing leading zero indices.

As usual, the bulk of the work was not in the changes themselves, but in mitigating resulting regressions. Many transforms were extended to work on chains of GEPs rather than only a single one. Once again, this is also useful independently of the ptradd migration, as chained GEPs were already very common beforehand.

There are still some major remaining pieces of work to complete this migration. The first one is to decide whether we want ptradd to support a constant scaling factor, or require it to be represented using a separate multiplication. There are good arguments in favor of both options.

The second one is to move from mere canonicalization towards requiring the new form. This would probably involve first making IRBuilder emit it, and then actually preventing construction of the type-based form. That would be the point where we’d actually introduce the ptradd instruction.

... continue reading