Partial inlining
Written by me, proof-read by an LLM.
Details at end.
We’ve learned how important inlining is to optimisation, but also that it might sometimes cause code bloat. Inlining doesn’t have to be all-or-nothing!
Let’s look at a simple function that has a fast path and slow path; and then see how the compiler handles it.
In this example we have some process function that has a really trivial fast case for numbers in the range 0-100. For other numbers it does something more expensive. Then compute calls process twice (making it less appealing to inline all of process ).
Looking at the assembly output, we see what’s happened: The compiler has split process into two functions, a process (part.0) that does the expensive part only. It then rewrites process into the quick check for 100, returning double the value if less than 100. If not, it jumps to the (part.0) function:
process ( unsigned int ): cmp edi , 99 ; less than or equal to 99? jbe .L7 ; skip to fast path if so jmp process ( unsigned int ) (. part.0 ) ; else jump to the expensive path .L7: lea eax , [ rdi + rdi ] ; return `value * 2` ret
This first step - extracting the cold path into a separate function - is called function outlining. The original process becomes a thin wrapper handling the hot path, delegating to the outlined process (.part.0) when needed. This split sets up the real trick: partial inlining. When the compiler later inlines process into compute , it inlines just the wrapper whilst keeping calls to the outlined cold path. External callers can still call process and have it work correctly for all values.
Let’s see this optimisation in action in the compute function:
... continue reading