Tech News
← Back to articles

Why is calling my asm function from Rust slower than calling it from C?

read original related products more articles

Why is calling my asm function from Rust slower than calling it from C?

This is a follow-up to making the rav1d video decoder 1% faster, where we compared profiler snapshots of rav1d (the Rust implementation) and dav1d (the C baseline) to find specific functions that were slower in the Rust implementation.

Today, we are going to pay off a small debt from that post: since dav1d and rav1d share the same hand-written assembly functions, we used them as anchors to navigate the different implementations - they, at least, should match exactly! And they did. Well, almost all of them did.

This, dear reader, is the story of the one function that didn’t.

An Overview

We’ll need to ask - and answer! - three ‘Whys’ today:

Using the same techniques from last time, we’ll see that a specific assembly function is, indeed, slower in the Rust version.

But why? ➡️ Because loading data in the Rust version is slower, which we discover using samply ’s special asm view. 1 But why? ➡️ Because the Rust version stores much more data on the stack, which we find by playing with some arguments and looking at the generated LLVM IR. 2 But why? ➡️ Because the compiler cannot optimize away a specific Rust abstraction across function pointers! 3

Which we fix by switching to a more compiler-friendly version (PR). 4

Side note: again, we’ll be running all these benchmarks on a MacBook, so our tools are a tad limited and we’ll have to resort to some guesswork. Leave a comment if you know more - or, even better, write an article about profiling on macOS 🍎💨.

... continue reading