Skip to content
Tech News
← Back to articles

Circuit Transformations, Loop Fusion, and Inductive Proof

read original more articles
Why This Matters

This article explores how hardware optimizations like carry-save addition can be viewed through the lens of compiler transformations such as loop fusion. Understanding these parallels can lead to more efficient hardware design and optimization techniques, ultimately benefiting both hardware engineers and software developers by bridging the gap between hardware and compiler optimization strategies.

Key Takeaways

Written with Samuel Coward

Are Datapath Circuit Transformations ≈ \approx ≈ Compiler Optimisations?

During a visit to Cornell, Sam presented a bunch of hardware optimisations to fuse arithmetic operators. Being from a compiler background, Nate asked whether we could view these as more traditional compiler transformations.

TLDR: Yes, you can, but it gets pretty ugly…

In this blog, we’re going to show how we answered this question for a specific example, namely discovering carry-save addition (a hardware optimisation) via loop fusion (a compiler optimisation).

Carry-Save Addition (by Sam)

Consider the problem of adding up three n-bit bitvectors (x+y+z) . Assuming we know how to add two bitvectors, the simplest solution is to add x and y , then add z to the result. However, each time we add two bitvectors, we have to propagate the carries all the way from the least-significant to the most significant bit.

The time to perform this computation unfortunately scales at least logarithmically with the width of the bitvectors. In hardware we can go much faster and compute (x+y+z) in the same time as (x+y) plus a small constant overhead. To achieve this, we use a full-adder which takes three individual bits of equal weight ( x[i], y[i], z[i] ) and produces two bits s and c such that: 2*c + s = x[i] + y[i] + z[i] . In terms of gates:

c = MAJ(x[i], y[i], z[i]) = (x[i] AND y[i]) XOR (x[i] AND z[i]) XOR (y[i] AND z[i]) s = x[i] XOR y[i] XOR z[i]

Using n of these full-adders operating in parallel on bits of equal weight, we can take our three addends and reduce them down to two. Consider this example:

... continue reading