Tech News
← Back to articles

Formally verifying a floating-point division routine with Gappa – part 1

read original related products more articles

We have recently released a set of optimized assembly-language routines for basic floating-point arithmetic, in Arm Optimized Routines, under an open-source license.

These functions perform the same operations as hardware floating point instructions, for example addition, multiplication, and division. However, they are implemented using only integer 32-bit Arm instructions, for Arm CPUs without hardware floating point. Existing open-source libraries such as libgcc and compiler-rt offer similar functions. Our optimized versions were previously part of the Arm Compiler for Embedded toolchain, which has reached end of life. We are now making them available as open source.

Preparing these functions for publication mainly involved polishing the code and rewriting the comments to improve clarity. However, one of them was much more interesting: polishing up the double-precision division function, “ddiv”, as we call it. The task turned into an adventure in formal verification, and ended up finding a bug in the original version.

In this two-part blog, I’ll tell the story, and share my learning experiences using Gappa and the Rocq prover.

What is “ddiv” and how does it work?

The “ddiv” function is implements double-precision floating-point division in accordance with the IEEE 754 standard. It accepts two (64-bit) floating-point numbers as input and gives their quotient as output. Since it only uses integer arithmetic, it treats the inputs and outputs as 64-bit integers. Each integer is divided into three fields: sign, exponent, and mantissa.

Floating-point functions require handling many special cases such as NaNs, infinities, denormalized numbers, overflow, underflow, and signed zeroes. Division also requires handling dividing by zero. However, this discussion focuses on ordinary cases, where nothing strange happens.

In these ordinary cases, the signs and exponents of the input number are easy to deal with. The primary challenge is the mantissas. Each input mantissa is effectively a 53-bit binary number, made by putting a 1 bit on the front of the 52 bits stored in the input value. Given two such mantissas, n (numerator) and d (denominator), the goal is to compute their quotient and round it to the nearest 53-bit binary number, efficiently.

There are many ways to approach this challenge. The most obvious is the binary analogue of ordinary pencil-and-paper long division: loop round 53 times, generating one output bit every time. However, we can go significantly faster than that.

The approach we took in “ddiv” is to use an approximate method. We start by calculating an approximation to 1 / d (the reciprocal of the denominator). Then we multiply that by n, to get an approximation to the quotient n / d. This strategy can be implemented a lot faster than the bit-by-bit loop, because the CPU’s multiplication instruction does a lot more of the work for us.

... continue reading