Improving performance of original dav1d video decoder
Published on: 2025-06-19 09:24:56
@another, @mstorsjo, @gramner
Introduction
I noticed a very clickbait bounty, I initially realized that company's original task was not to overtake implementation, but to advertise that Rust is 5% slower than C. Whether she actually pays or not is another matter. The main thing for Prossimo was to make a fuss that the current rav1d implementation was only 5% slower, so that the general public would think that the language was the same in speed.
I also noticed contributor's blog who tried to optimize rav1d, but he didn't go beyond 1%. Actually, I solved his problem, he came out at 0%. CHICKEN JOCKEY
Well, first thing I decided to do was look dav1d at the memory organization in CPU cachelines, and I noticed that dav1d really consumes large structures. It is desirable to have structures of 64 bytes or less in size, it is easier for C/C++/C# compiler to process them. Since it's very difficult to recycle structures, I solved problem more simply.
At first, out of habit, I aligned, but I
... Read full article.