Google reportedly books Intel for packaging more than 3 million TPUs in 2028 — SK hynix is testing Intel's EMIB packaging for HBM integration

Google has placed an order for Intel to build more than 3 million of its TPUs in 2028 after months of testing Intel's advanced packaging, according to The Information, citing four people familiar with the matter. They claim that Nvidia is evaluating Intel to build a future processor that fuses four GPU dies into one unit, tied to its Feynman architecture due in 2028, and that SK hynix is testing whether its high-bandwidth memory works reliably with Intel's packaging.

Specifically, SK hynix needs to know whether Intel can run packaging to the standard that AI accelerators demand. TSMC’s CoWoS is the industry-standard process for it and has been oversubscribed for more than two years. Intel’s embedded multi-die interconnect bridge, or EMIB, is the only alternative AI chip makers can realistically qualify at volume before the end of the decade.

This isn’t a first for Intel: Google and Amazon were reported to be in active discussions for their custom AI processors back in April, but the remarks from these sources move those “discussions” to a solid unit figure and production timeline, adding in SK hynix qualification that would ultimately determine whether any of it reaches Nvidia accelerators.

Latest Videos From Watch full video here:

CoWoS bottlenecked

TSMC's leading-edge wafer lines and its CoWoS packaging are both at capacity. At the company's annual shareholders' meeting in Hsinchu on June 4th, CEO C.C. Wei said, "It will be a long time before we can meet customer demand," telling shareholders that the company simply can’t satisfy American customer demand for years, even as it builds out U.S. capacity. He had already told the Semiconductor Industry Association last November that TSMC's advanced-node capacity falls "about three times short" of demand.

The queue for CoWoS is concentrated across a handful of buyers. Nvidia is naturally expected to account for the majority of global CoWoS demand — about 60% this year — with Broadcom and AMD absorbing another 26% between them, leaving custom-ASIC designers and smaller AI-chip makers waiting behind the largest GPU order book in the industry. But the industry can’t wait, and both these smaller players and hyperscalers alike with multimillion-unit roadmaps need to qualify a second packaging solution rather than wait for capacity that TSMC says will be short for years.

As for EMIB vs. CoWoS, they solve the same problem in opposite ways. CoWoS mounts every die on a large silicon interposer that all signals and power must cross, and the interposer scales with package size, so reticle-class designs waste silicon at the edges. EMIB, meanwhile, embeds small silicon bridges in the organic substrate only where two dies need to connect, with no interposer at all. Intel cites package utilization near 90% EMIB against roughly 60% for interposer-class packaging, because small bridges tile efficiently while large interposers don’t.

Bernstein analysts estimate EMIB packaging costs a few hundred dollars per chip against $900 to $1,000 for CoWoS on a Rubin-class processor, though the firm flags the fact that there’s a “lack of an external production track record” in that estimate. As always, there’s a trade-off: standard EMIB routes power around the bridge through the substrate in long, resistive paths. That might have been acceptable for Sapphire Rapids and Ponte Vecchio, but not for HBM4-class accelerators that draw more current.

EMIB-T closes that gap by adding through-silicon vias to the bridge die for vertical power delivery, and it’s set to enter production fab rollout this year. Intel has said EMIB-T supports HBM3, HBM3E, HBM4, and future HBM5 stacks and scales to a 120mm x 180mm package carrying more than 38 bridges and over 12 reticle-sized dies. Jaguar Shores, the successor to the canceled Falcon Shores accelerator, is the likely first product to use it.

... continue reading