Skip to content
Tech News
← Back to articles

Gzip decompression in 250 lines of Rust

read original more articles
Why This Matters

This article highlights the importance of understanding gzip decompression, a fundamental component of the modern web and data storage infrastructure. By creating a simple Rust implementation, it demystifies the core mechanics behind a ubiquitous compression format, empowering developers to better optimize and troubleshoot their systems.

Key Takeaways

i wanted to have a deeper understanding of how compression actually works, so i wrote a gzip decompressor from scratch. the result is about 250 lines of rust that can decompress gzip from a file or stdin.

why bother?

gzip is everywhere. it compresses your web traffic, your log files, your documentation / man pages, it is the format commoncrawl stores 300 billion pages and hundreds of terabytes of web archives in. it sits invisibly between disks, networks, and CPUs, quietly shaving off bytes at planetary scale, so boringly reliable that the browser won't bother telling you it’s there at all. it's a fundamental tool in the software ecosystem. so let's read it, the first thing to do is clone zlib and start reading through the source code, right? well lets first see how long it is:

/mnt/g/repos/zlib ❯ fd -e c -0 | xargs -0 cat | wc -l 25569

twenty five thousand lines of pure C not counting CMake files. (and whenever working with C always keep in mind that C stands for CVE).

ok maybe we can find a smaller implementation. perhaps zlib-rs is more digestible:

/mnt/g/repos/zlib-rs ❯ fd -e rs -0 | xargs -0 cat | wc -l 36003

thirty six thousand lines of rust. that's a bit more than i wanted to read through.. actually there is a smaller implementation called miniz which is only 1261 lines of C if you combine the header and source:

/mnt/g/repos/miniz ❯ cat miniz.c miniz.h | wc -l 1261

the problem is just looking at miniz is not going to give you a good understanding of how gzip works. it's a single file, but it's still 1200 lines of code with a lot of optimizations and c preprocessor string substitutions stitched together. i wanted something even simpler, something that just implements the core ideas without worrying about checksums or features that aren't used in practice.

... continue reading