The provenance memory model for C

In this article, I will try to explain what this is all about, namely on how a provenance model for pointers interferes with alias analysis of modern compilers. For those that are not fluent with the terminology or the concept we have a short intro what pointer aliasing is all about , a review of existing tools to help the compiler and inherent difficulties and then the proposed model itself . At the end there is a brief takeaway that explains how to generally avoid complications and loss of optimization opportunities that could result from mis-guided aliasing analysis.

This work has finally resulted in the publication of an international standard, Technical Specification ISO/IEC TS 6010 (edited by Henry Kleynhans , Bloomberg, UK). With the goal of making modern information systems safer and more secure, this official technical specification provides direction to all stakeholders in the industry such that they can converge their platforms and tools.

A years-long effort led by Kayvan Memarian and Peter Sewell from Cambridge University, UK, Martin Uecker from Graz University of Technology, Austria, and myself (from ICube/Inria, France) has guided the C community to accept a common vision of so-called pointer provenance, defining how to trace the origins of pointer values through program execution. Our provenance-aware memory object model for C provides a precise mathematical specification, in place of the ambiguity of these aspects of the current C standard . It has also stimulated and informed discussion of provenance in the broader C, C++, Rust, and compiler communities.

Pointer aliasing and program optimization

We say that two pointer values p and q during the execution of a program alias if they point to the same object in memory. To see that the question if two pointers alias has an influence on the optimization of code, let’s consider the following simple example of an iterative function.

It implements an approximation algorithm for the reciprocal r := 1/a of the value a := *aₚ which doesn’t use division.

// Reciprocal approximation // // This interface is horrific, don't use it. // This just serves as an artificial example // for this article. constexpr double ε = 0x1P-24; constexpr double Π⁻ = 1.0 - ε; constexpr double Π⁺ = 1.0 + ε; void recip(double* aₚ, double* řₚ) { for (;;) { register double Π = (*aₚ)*(*řₚ); if ((Π⁻ < Π) && (Π < Π⁺)) { break; } else { (*řₚ) *= (2.0 - Π); } } }

The function receives a pointer to a second value ř := *řₚ with a rough approximation for r . It then iteratively approaches r within a chosen precision ε : the current values a and ř are multiplied into a value Π and if that value is sufficiently close to 1.0 the iteration stops. If it is not, ř is corrected and the loop continues.

What is interesting for our context of aliasing is that this function has two pointer arguments that both point to a value of the same type double . One of these pointer targets *řₚ is loaded from memory, modified and stored at each iteration. In total, the non-optimized function as specified above in each iteration has

3 load and 1 store operations,

... continue reading