Neoclassical C++: segmented iterators revisited

Introduction to segmented iterators

The legendary STL library has been an inspiration in the development of the C++ language and libraries. The understanding of the concepts and code developed by Stepanov, Lee, Musser, Austern, etc. is a treasure trove for any C++ programmer.

Stepanov’s core claim was that abstraction should have near-zero overhead. He argued that generic programming, done correctly, should not impose a noticeable runtime cost (abstraction penalty) compared to hand-written specialized code. This was not just a hope but a design principle of the STL.

Following this philosophy, in 2000 Matt Austern published the paper Segmented Iterators and Hierarchical Algorithms. The paper’s core argument was that many data structures are naturally segmented (e.g. deque), and generic algorithms that ignore that feature — treating every data structure as a uniform range of elements — are unnecessarily inefficient. A new kind of iterator abstraction, in which segmentation is explicit, could make possible to write hierarchical algorithms that exploit segmentation and reduce the abstraction penalty associated with standard iterators.

The classic motivating example is std::deque, which is internally a sequence of fixed-size arrays. Every ++it on a deque iterator may cross a block boundary, so inplicitly std::find or std::fill has to evaluate, on every step, whether the current pointer has reached the end of the current block, an indirection that makes deque iteration measurably slower and also defeats most compilers’ auto-vectorisers.

A segmented iterator could be a two-level structure, allowing an algorithm to operate on each contiguous segment efficiently (for instance, calling memset or a tight inner loop on each chunk) and only handle the segment-to-segment transitions at the outer level.

This paper remained influential but its ideas were never adopted into the C++ standard. On the other hand, some libraries have internally developed those core ideas. Some examples::

Austern’s paper explained

A traditional iterator presents a flat view: increment, decrement, dereference. A segmented iterator additionally admits being decomposed into two coordinates:

a segment iterator that walks the outer sequence of segments (the blocks of a deque, the chunks of a chunk-list, the buckets of a chained hash table, etc.). This iterator is not dereferenceable. a local iterator that walks the inner range inside one segment. This iterator is dereferenceable.

... continue reading