Although the performance of high-bandwidth memory (HBM) has increased by an order of magnitude since its inception around a decade ago, many elements have remained fundamentally unchanged between HBM1 and HBM3E. But as the demands for bandwidth-hungry applications evolve, the technology must also change to accommodate them.
In new information revealed at TSMC's European OIP forum in late November, HBM4 and HBM4E will offer four major changes. HBM4 will receive a 2,048-bit interface and base dies produced using advanced logic technologies. Meanwhile, HBM4E will be able to utilize customizable base dies, which can be controlled with custom interfaces. These are dramatic shifts, which will have a big impact sooner than you might think.
HBM4, HBM4E, and C-HBM4E are on track to hit the market in 2026 and 2027, boasting the aforementioned 2048-bit standard interface with data transfer rates of up to 12.8 GT/s. Additionally, the customizable base dies will be able to use advanced logic technologies up to 3nm-class. This offers higher area efficiency, which TSMC claims represents a 2.5 times increase in performance.
HBM4: The next big thing
HBM4 — whose specification was officially published earlier this year — is the standard that sets the stage for a number of upcoming innovations in the AI and HPC memory market.
(Image credit: TSMC)
Each HBM4 memory stack features a 2,048-bit interface that officially supports data transfer rates of up to 8 GT/s, though controllers from controller specialists like Rambus and HBM4 stacks from leading DRAM vendors already support speeds of 10 GT/s or higher, since implementers want to have some reserve for additional peace of mind.
A stack with a 2,048-bit interface operating at 12 GT/s can deliver bandwidth of 2 TB/s, so an AI accelerator with eight HBM4 stacks will have access to potential bandwidth of 16 TB/s. And 12 GT/s could be just the beginning. Note that Cadence is already offering an HBM4E physical interface (PHY) with 12.8 GT/s support.
Internally, HBM4 doubles concurrency to 32 independent channels per stack (each split into two pseudo-channels), which reduces bank conflicts and raises efficiency throughput under highly parallel access patterns.
HBM4 stacks also support 24 Gb and 32 Gb DRAM devices and offer configurations for 4-Hi, 8-Hi, 12-Hi, and 16-Hi stacks, thus enabling capacities of up to 64 GB, which allows to build accelerators for next-generation AI models with trillions of parameters. Micron expects 64 GB stacks to become common with HBM4E sometime after late 2027, which aligns with Nvidia plans to equip its Rubin Ultra GPU with 1 TB of HBM4E memory.
... continue reading