Tech News
← Back to articles

How and When the Memory Chip Shortage Will End

read original related products more articles

If it feels these days as if everything in technology is about AI, that’s because it is. And nowhere is that more true than in the market for computer memory. Demand, and profitability, for the type of DRAM used to feed GPUs and other accelerators in AI data centers is so huge that it’s diverting away supply of memory for other uses and causing prices to skyrocket. According to Counterpoint Research, DRAM prices have risen 80-90 precent so far this quarter.

The largest AI hardware companies say they have secured their chips out as far as 2028, but that leaves everybody else—makers of PCs, consumer gizmos, and everything else that needs to temporarily store a billion bits—scrambling to deal with scarce supply and inflated prices.

How did the electronics industry get into this mess, and more importantly, how will it get out? IEEE Spectrum asked economists and memory experts to explain. They say today’s situation is the result of a collision between the DRAM industry’s historic boom and bust cycle and an AI hardware infrastructure build-out that’s without precedent in its scale. And, barring some major collapse in the AI sector, it will take years for new capacity and new technology to bring supply in line with demand. Prices might stay high even then.

To understand both ends of the tale, you need to know the main culprit in the supply and demand swing, high-bandwidth memory, or HBM.

What is HBM?

HBM is the DRAM industry’s attempt to short-circuit the slowing pace of Moore’s Law by using 3D chip packaging technology. Each HBM chip is made up of as many as 12 thinned-down DRAM chips called dies. Each die contains a number of vertical connections called through silicon vias (TSVs). The dies are piled atop each other and connected by arrays of microscopic solder balls aligned to the TSVs. This DRAM tower—well, at about 750 micrometers thick, it’s more of a brutalist office-block than a tower—is then stacked atop what’s called the base die, which shuttles bits between the memory dies and the processor.

This complex piece of technology is then set within a millimeter of a GPU or other AI accelerator, to which it is linked by as many as 2,048 micrometer-scale connections. HBMs are attached on two sides of the processor, and the GPU and memory are packaged together as a single unit.

The idea behind such a tight, highly-connected squeeze with the GPU is to knock down what’s called the memory wall. That’s the barrier in energy and time of bringing the terabytes per second of data needed to run large language models into the GPU. Memory bandwidth is a key limiter to how fast LLMs can run.

As a technology, HBM has been around for more than 10 years, and DRAM makers have been busy boosting its capability.

As the size of AI models has grown, so has HBM’s importance to the GPU. But that’s come at a cost. SemiAnalysis estimates that HBM generally costs three times as much as other types of memory and constitutes 50 percent or more of the cost of the packaged GPU.

... continue reading