Stepanov's biggest blunder? The curious case of adjacent difference

The curious case of adjacent difference If you have ever tried using the std::adjacent_difference algorithm in c++, I’m sure it left you puzzled. As the name suggests, this algorithm computes differences between adjacent elements of the input sequence, but it does one more thing: it copies the first element of the input sequence into the output sequence unmodified. The following example demonstrates how to apply the algorithm to delta-compress a postings list of document identifiers that contain a search term (the example is contrived since Google developed much more sophisticated posting list compression techniques). ⊕ Delta-compressing a posting list using the std::adjacent_difference algorithm. The compressed version might require less memory when encoded using variable-length integers. #include #include #include // prints: // 12586 426 548 110 566 int main() { // Sorted list of documents containing a word. std::vector postings{{12586, 13012, 13560, 13670, 14236}}; std::vector compressed(postings.size()); std::adjacent_difference( postings.cbegin(), postings.cend(), compressed.begin() ); for (auto n : compressed) std::cout << n << " "; std::cout << std::endl; } This extra copy makes the algorithm less generic and practical, since a difference of two values of an arbitrary type T is likely to have a different type. For example, the difference between two timestamps is not a timestamp but a duration; and subtracting two unsigned integers might require a signed integer. The following example illustrates this limitation: the code attempts to compute durations between timestamps in a container, but the compiler rejects it. ⊕ Attempting to compute std::adjacent_difference over a vector of timestamps leads to a compiler error. #include #include #include int main() { using clock = std::chrono::steady_clock; auto start = clock::now(); std::vector timestamps{ start, start + std::chrono::seconds(5), start + std::chrono::seconds(15) }; std::vector durations; std::adjacent_difference( timestamps.cbegin(), timestamps.cend(), std::back_inserter(durations) ); } Why did Alex Stepanov design adjacent_difference in such a way? Was it a mistake? No, it was a deliberate design choice; the original sgi stl documentation provides the reasoning behind it: The reason it is useful to store the value of the first element, as well as simply storing the differences, is that this provides enough information to reconstruct the input range. In particular, if addition and subtraction have the usual arithmetic definitions, then adjacent_difference and partial_sum are inverses of each other. sgi stl documentation std::adjacent_difference and std::partial_sum are inverses: computing partial sums over the adjacent differences of a sequence yields that sequence, and vice versa. ⊕ std::partial_sum undoes the effects of std::adjacent_difference , and vice versa. In this and other sequence examples, each line transforms the line above it. sequence 1 1 2 3 5 adjacent_difference 1 0 1 1 2 partial_sum 1 1 2 3 5 This symmetry has a beautiful connection to the fundamental theorem of calculus, and it helped me understand both the algorithms and the theorem more intuitively. The following section explores this connection in detail.

Three central problems of calculus According to Steven Strogatz Steven Strogatz, Infinite Powers , p. 144 , calculus has three central problems: given a curve, find its slope everywhere (the forward problem), given a curve’s slope everywhere, find the curve (the backward problem), given a curve, find an area under it (the area problem). The fundamental theorem of calculus connects all three problems and states that the area under a slope of a curve on an interval is the difference of the curve height evaluated at the ends of this interval: ∫ a b f ' ( x ) d x = f ( b ) - f ( a ) Calculus deals with continuous curves, but we will tackle these problems through the lens of discrete value sequences. We’ll view a sequence f as a function over natural number and denote the discrete derivative of f as D f . and its antiderivative as S f , where D f ( i ) = f ( i + 1 ) - f ( i ) S f ( i ) = ∑ k = 0 i f ( k ) D f corresponds to differences between adjacent elements in the source sequence, and S f corresponds to partial sums. The discrete variant of the fundamental theorem becomes: ∑ i = k n D f ( i ) = f ( n ) - f ( k - 1 ) This formula has an off-by-one issue: if k = 0 , we have to access the minus first element, so, for convenience, we define f ( - 1 ) = 0 . I like how Kyne Santos explained this result: imagine you’re hiking up a mountain, going from point A to point B, and you want to find out the overall change in elevation. So let’s say that point A, the starting point, is 500 meters above sea level. In the first hour, you ascend 100 meters. In the second hour, you descend 50 meters, and in the third and final hour you ascend 200 meters to arrive at point B, which is 750 meters above sea level. The question is, what’s the overall change in elevation? Well, there’s two ways to go about it. You can find the final elevation, which is 750 meters, and just subtract the starting elevation, which was 500 and the difference between 750 and 500 is 250 meters. Or you can add up the little changes along the way. So in the first hour, we climbed 100 meters, and then we descended 50, and then we climbed another 200 so 100 minus 50 plus 200 is 250 meters. And these two approaches represent the two sides of the equation in the fundamental theorem of calculus. Kyne Santos, episode 95 of the My Favorite Theorem podcast Finding slopes with adjacent differences The slope between two discrete values is their difference, so taking the difference between neighboring points of a sequence solves the forward problem, yielding N - 1 slopes for a sequence of N elements. sequence 1 3 5 7 9 slopes 2 2 2 2 This operation loses information: we cannot recover the original sequence from its slopes. In calculus, taking a derivative is also a lossy operation since functions f ( x ) and g ( x ) = f ( x ) + c (where c is an arbitrary constant) have the same derivative f ' ( x ) . Recovering the original with partial sums The backward problem is the most interesting of the three. Since the sequence of slopes had one less item than the source sequence, there is not enough information to recover the latter. We know the changes between points, but not the first value to apply the changes to. Once we know it, we can compute the rest using partial sums. slopes 2 2 2 2 sequence C C+2 C+4 C+6 C+8 The lossiness of differentiation is the reason for introducing an arbitrary constant C when computing an indefinite integral in calculus: ∫ f ' ( x ) d x = f ( x ) + C In the discrete case, this constant corresponds to the first value in the original sequence; that’s why std::adjacent_difference preserves the first element in its output. Finding areas with partial sums The area of a sequence is the sum of its elements. The sequence of partial sums solves this problem because it allows us to sum the items on any subinterval of the original sequence in a single step: ∑ i = k n f ( i ) = S f ( n ) - S f ( k - 1 ) (as I previously mentioned, we define S f ( - 1 ) to be zero). sequence 1 3 5 7 9 partial sums 1 4 9 16 25 Partial sums don’t introduce or lose any information. We can always recover the original sequence using the stl definition of adjacent differences: sequence 1 3 5 7 9 partial sums 1 4 9 16 25 adjacent diff 1 3 5 7 9

Symmetry vs pragmatism The connection between std::partial_sums and std::adjacent_difference is aesthetically pleasing, but I find the design of the latter algorithm unfortunate: std::adjacent_difference is significantly less generic than its lossy version (i.e., a version that doesn’t copy the first element) would be, as it forces the output element type to match the input element type. The few times I needed to compute pairwise differences, the semantics of std::adjacent_difference stood in the way, and I ended up writing a custom loop. I’m not alone. Luckily, the pairwise_transform adapter from c++23 doesn’t make the extra copy. Derivatives in calculus are lossy, so the definition of std::adjacent_difference doesn’t exactly correspond to a derivative. Forcing the symmetry between discrete algorithms breaks the symmetry with their continuous counterparts. I disagree with Stepanov’s design choice, but I’m glad it made me question his intentions and find these hidden connections. api design is hard, especially when you try to express novel ideas, such as efficient generic programming, in a new programming language, as c++ was at the time. Just like Einstein’s cosmological constant, Stepanov’s extra copy turned out to be useful after all.

Appendix: deltas in q Just like std::adjacent_difference , the deltas function from the q programming language preserves the first item of its input: deltas 1 4 9 16 1 3 5 7 However, the deltas function operates slightly differently from its c++ cousin: Instead of copying the first item verbatim, it prepends a seed value of zero to the sequence before computing pairwise differences. deltas 1 4 9 16 == (1 - 0) (4 - 1) (9 - 4) (16 - 9) == 1 3 5 7 q defines deltas in terms of a more general Each Prior operator as -': . This direct form picks the seed based on the operation (zero for subtraction, one for multiplication, etc.) and allows the caller to override the default through its left argument. (*':) 2 3 4 / 1 is the identity for *. 2 6 12 1950 -': 1952 1954 1960 / Use 1950 as the seed instead of zero. 2 2 6 This design preserves symmetry with partial sums but avoids the type mismatch between the input and output sequences, achieving both elegance and pragmatism.