Reverse Engineering a Neural Network's Clever Solution to Binary Addition (2023)

There's a ton of attention lately on massive neural networks with billions of parameters, and rightly so. By combining huge parameter counts with powerful architectures like transformers and diffusion, neural networks are capable of accomplishing astounding feats.

However, even small networks can be surprisingly effective - especially when they're specifically designed for a specialized use-case. As part of some previous work I did, I was training small (<1000 parameter) networks to generate sequence-to-sequence mappings and perform other simple logic tasks. I wanted the models to be as small and simple as possible with the goal of building little interactive visualizations of their internal states.

After finding good success on very simple problems, I tried training neural networks to perform binary addition. The networks would receive the bits for two 8-bit unsigned integers as input (converted the bits to floats as -1 for binary 0 and +1 for binary 1) and would be expected to produce properly-added output, including handling wrapping of overflows.

Training example in binary: 01001011 + 11010110 -> 00100001 As input/output vectors for NN training: input: [-1, 1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, -1, 1, 1, -1] output: [-1, -1, 1, -1, -1, -1, -1, 1]

What I hoped/imagined the network would learn internally is something akin to a binary adder circuit:

I expected that it would identify the relationships between different bits in the input and output, route them around as needed, and use the neurons as logic gates - which I'd seen happen in the past for other problems I tested.

Training the Network

To start out, I created a network with a pretty generous architecture that had 5 layers and several thousand parameters. However, I wasn't sure even that was enough. The logic circuit diagram above for the binary adder only handles a single bit; adding 8 bits to 8 bits would require a much larger number of gates, and the network would have to model all of them.

Additionally, I wasn't sure how the network would handle long chains of carries. When adding 11111111 + 00000001 , for example, it wraps and produces an output of 00000000 . In order for that to happen, the carry from the least-significant bit needs to propagate all the way through the adder to the most-significant bit. I thought that there was a good chance the network would need at least 8 layers in order to facilitate this kind of behavior.

Even though I wasn't sure if it was going to be able to learn anything at all, I started off training the model.

... continue reading