Circuit Transformations, Loop Fusion, and Inductive Proof

Written with Samuel Coward

Are Datapath Circuit Transformations ≈ \approx ≈ Compiler Optimisations?

During a visit to Cornell, Sam presented a bunch of hardware optimisations to fuse arithmetic operators. Being from a compiler background, Nate asked whether we could view these as more traditional compiler transformations.

TLDR: Yes, you can, but it gets pretty ugly…

In this blog, we’re going to show how we answered this question for a specific example, namely discovering carry-save addition (a hardware optimisation) via loop fusion (a compiler optimisation).

Carry-Save Addition (by Sam)

Consider the problem of adding up three n-bit bitvectors (x+y+z) . Assuming we know how to add two bitvectors, the simplest solution is to add x and y , then add z to the result. However, each time we add two bitvectors, we have to propagate the carries all the way from the least-significant to the most significant bit.

The time to perform this computation unfortunately scales at least logarithmically with the width of the bitvectors. In hardware we can go much faster and compute (x+y+z) in the same time as (x+y) plus a small constant overhead. To achieve this, we use a full-adder which takes three individual bits of equal weight ( x[i], y[i], z[i] ) and produces two bits s and c such that: 2*c + s = x[i] + y[i] + z[i] . In terms of gates:

c = MAJ(x[i], y[i], z[i]) = (x[i] AND y[i]) XOR (x[i] AND z[i]) XOR (y[i] AND z[i]) s = x[i] XOR y[i] XOR z[i]

Using n of these full-adders operating in parallel on bits of equal weight, we can take our three addends and reduce them down to two. Consider this example:

... continue reading