๐๏ธ when rust โ performance. a lesson in devex. getting 20x speedup by removing code.
I love working at oxen because we strive to be the fastest data versioning tool in the market. Which is why we often do multi terabyte benchmarks on each commands that we support , (add, commit, etc.)
All staged files, their location and the file hashes get stored in a rocksdb when oxen add command is used, then during oxen commit we create new nodes for new files/dirs and commit.
I was noting that our add times were pretty quick. ~1 minute for 1M files. But, commit was taking >50 minutes. This was interesting to me because commit is an O(n) algorithm with n = number of directories. we create n directory nodes and 1 commit node, that is all. even with a million files that is not that many directories.
As is usual for all performance work. I profiled the commit command with, samply which is a wildly convenient tool to profile binaries and view it on a nice UI.
If you notice closely, >90% of the time was being spent just getting the lock on the staging RocksDB. This was because all our parallel workers are graduating the files from staged to committed by creating the respective nodes for the directories and creating a complete view of the repository at that commit.
but because we have this data (file name, etc.) passed back and forth between different layers in the code. we were doing lots of .clone() and db.open() at different layers to fetch the same data. The interesting part for me was that, this change reduced our
The final PR looked like this to improve performance by 20X
With performance sensitive application. Often less is more and simple is better.
I often think about this quote from Fabien .
... continue reading