Posted on: September 24, 2025
If you’ve ever worked with massive datasets, you know that memory usage can quickly become a bottleneck. While developing succinct data structures, I found myself needing to store large arrays of integers—values with no monotonicity or other exploitable patterns, that I knew came from a universe much smaller than their type’s theoretical capacity.
In this post, we will explore the engineering challenges involved in implementing an efficient vector-like data structure in Rust that stores integers in a compressed, bit-packed format. We will focus on achieving O(1) random access performance while minimizing memory usage. We will try to mimic the ergonomics of Rust’s standard Vec<T> as closely as possible, including support for mutable access and zero-copy slicing.
All the code can be found on github: compressed-intvec
This is also published as a crate on crates.io: compressed-intvec
Memory Waste in Standard Vectors
In Rust, the contract of a Vec<T> (where T is a primitive integer type like u64 or i32 ) is simple: O(1) random access in exchange for a memory layout that is tied to the static size of T . This is a good trade-off, until it isn’t. When the dynamic range of the stored values is significantly smaller than the type’s capacity, this memory layout leads to substantial waste.
Consider storing the value 5 within a Vec<u64> . Its 8-byte in-memory representation is:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000101
Only 3 bits are necessary to represent the value, leaving 61 bits as zero-padding. The same principle applies, albeit less dramatically, when storing 5 in a u32 or u16 . At scale, this overhead becomes prohibitive. A vector of one billion u64 elements consumes 10^9 * std::mem::size_of::<u64>() , or approximately 8 GB of memory, even if every element could fit within a single byte.
... continue reading