The Hardware Lottery

Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is odd given that hardware and software have frequently determined which research ideas succeed (and fail).

This essay introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is universally superior to alternative research directions. History tells us that hardware lotteries can obfuscate research progress by casting successful ideas as failures and can delay signaling that some research directions are far more promising than others.

These lessons are particularly salient as we move into a new era of closer collaboration between hardware, software and machine learning research communities. After decades of treating hardware, software and algorithms as separate choices, the catalysts for closer collaboration include changing hardware economics , a “bigger is better” race in the size of deep learning architectures and the dizzying requirements of deploying machine learning to edge devices .

Closer collaboration has centered on a wave of new generation hardware that is "domain specific" to optimize for commercial use cases of deep neural networks. While domain specialization creates important efficiency gains, it arguably makes it more even more costly to stray off of the beaten path of research ideas. While deep neural networks have clear commercial use cases, there are early warning signs that the path to true artificial intelligence may require an entirely different combination of algorithm, hardware and software.

This essay begins by acknowledging a crucial paradox: machine learning researchers mostly ignore hardware despite the role it plays in determining what ideas succeed. What has incentivized the development of software, hardware and algorithms in isolation? What follows is part position paper, part historical review that attempts to answer the question, "How does tooling choose which research ideas succeed and fail, and what does the future hold?"

Separate Tribes

For the creators of the first computers the program was the machine. Early machines were single use and were not expected to be re-purposed for a new task because of both the cost of the electronics and a lack of cross-purpose software. Charles Babbage’s difference machine was intended solely to compute polynomial functions (1817) . Mark I was a programmable calculator (1944) . Rosenblatt’s perceptron machine computed a step-wise single layer network (1958) . Even the Jacquard loom, which is often thought of as one of the first programmable machines, in practice was so expensive to re-thread that it was typically threaded once to support a pre-fixed set of input fields (1804) .

Early computers such as the Mark I were single use and were not expected to be repurposed. While Mark I could be programed to compute different calculations, it was essentially a very powerful reprogramable calculator and could not run the variety of programs that we expect of our modern day machines.

The specialization of these early computers was out of necessity and not because computer architects thought one-off customized hardware was intrinsically better. However, it is worth pointing out that our own intelligence is both algorithm and machine. We do not inhabit multiple brains over the course of our lifetime. Instead, the notion of human intelligence is intrinsically associated with the physical 1400g of brain tissue and the patterns of connectivity between an estimated 85 billion neurons in your head .

... continue reading