Tech News
← Back to articles

Optimizing My Disk Usage Program

read original related products more articles

In my previous post, Maybe the Fastest Disk Usage Program on macOS, I wrote about dumac. A very fast alternative to du -sh that uses a macOS-specific syscall getattrlistbulk to be much faster than the next leading disk usage program.

I received some great technical feedback in the Lobsters thread. After implementing some of the suggestions, I was able to increase performance by ~28% on my large benchmark.

hyperfine --warmup 3 --min-runs 5 './before temp/deep' './after temp/deep' Benchmark 1: ./before temp/deep Time (mean ± σ): 910.4 ms ± 10.1 ms [User: 133.4 ms, System: 3888.5 ms] Range (min … max): 894.5 ms … 920.0 ms 5 runs Benchmark 2: ./after temp/deep Time (mean ± σ): 711.9 ms ± 10.5 ms [User: 73.9 ms, System: 2705.6 ms] Range (min … max): 700.1 ms … 725.0 ms 5 runs Summary ./after temp/deep ran 1.28 ± 0.02 times faster than ./before temp/deep

The main performance gain came from reducing thread scheduling overhead, while minor gains were from optimizing access to the inode hash-set shards.

Better Parallelism

The previous version of dumac used Tokio to spawn a task for each directory.

// Process subdirectories concurrently if ! dir_info . subdirs . is_empty ( ) { let futures : Vec < _ > = dir_info . subdirs . into_iter ( ) . map ( | subdir | { let subdir_path = Path :: new ( & root_dir ) . join ( & subdir ) . to_string_lossy ( ) . to_string ( ) ; tokio :: spawn ( async move { calculate_size ( subdir_path ) . await } ) } ) . collect ( ) ; // Collect all results for future in futures { match future . await { Ok ( Ok ( size ) ) => total_size += size , // .. } } }

And then in the middle of this task, calling getattrlistbulk required a blocking call:

// In the middle of the async task // .. let dir_info = tokio :: task :: spawn_blocking ( { let root_dir = root_dir . clone ( ) ; move | | get_dir_info ( & root_dir ) // Calls getattrlistbulk } ) . await . map_err ( | _ | "task join error" . to_string ( ) ) ? ? ;

Tokio runs many tasks on a few threads by swapping the running task at each .await . Blocking threads are spawned on demand to avoid blocking the core threads which are handling tasks. I learned this after shipping the first version when I read CPU-bound tasks and blocking code.

... continue reading