Converting floating-point numbers to their shortest decimal representations has long been a surprisingly challenging problem. For decades, developers relied on algorithms such as Dragon4 that used multi-precision arithmetic. More recently, a new generation of provably correct and high-performance algorithms emerged, including Ryu, Schubfach and Dragonbox.
In my earlier post, double to string conversion in 150 lines of code, I explored how compact a basic implementation can be while still maintaining correctness. The Schubfach algorithm pushes this idea much further: it enables fully proven, high-performance dtoa with an amount of code comparable to the naïve algorithm. This post gives a brief overview of Schubfach and then walks through the minimal C++ implementation in https://github.com/vitaut/schubfach.
A brief overview of the Schubfach algorithm
The Schubfach algorithm, introduced by Raffaello Giulietti, is a method to convert binary floating-point numbers into the shortest, correctly rounded decimal string representation with a round-trip guarantee. It is based on two key insights:
1. The non-iterative search guided by the pigeonhole principle
The algorithm is founded on the Schubfachprinzip (the pigeonhole principle), which gives the algorithm its name, to efficiently determine the correct decimal exponent ($k$) for the optimal shortest decimal representation ($d_v$).
Instead of iteratively searching for the shortest decimal representation, Schubfach determines a unique integer exponent $k$ based on comparing the distance between adjacent decimal values ($10^k$) in a given scale ($10_k$) against the width of the floating-point number’s rounding interval ($\Vert R_v \Vert$).
The exponent $k$ is calculated such that $10^k \leq \Vert R_v \Vert < 10^{k+1}$. This comparison guarantees that the set of decimals rounding back to the original floating-point value $v$ ($R_k$) contains at least one element, and the set at the next finest resolution ($R_{k+1}$) contains at most one element.
This approach makes the algorithm non-iterative.
2. Guaranteed accuracy through fixed-precision integer estimates
... continue reading