Nvidia has broken its silence following reports that Meta is in advanced discussions to spend billions of dollars on Google’s custom Tensor Processing Units (TPUs), a move that would mark a rare shift in the company's AI infrastructure strategy. Nvidia, which saw its stock dip last week as Alphabet’s rose, issued a pointed statement in response on Tuesday.
“We’re delighted by Google’s success — they’ve made great advances in AI and we continue to supply to Google,” Nvidia wrote. “NVIDIA is a generation ahead of the industry — it’s the only platform that runs every AI model and does it everywhere computing is done. NVIDIA offers greater performance, versatility, and fungibility than ASICs, which are designed for specific AI frameworks or functions.”
The response highlights Nvidia’s awareness of what’s at stake. While Meta’s reported plan involves an initial rental phase and phased purchases starting in 2027, any serious pivot away from Nvidia hardware would reverberate throughout the AI ecosystem. Google’s TPU architecture, once used solely in-house, is now part of an aggressive bid to capture hyperscaler business from Nvidia’s dominant platform.
We’re delighted by Google’s success — they’ve made great advances in AI and we continue to supply to Google.NVIDIA is a generation ahead of the industry — it’s the only platform that runs every AI model and does it everywhere computing is done.NVIDIA offers greater…November 25, 2025
ASIC acceleration vs GPU versatility
Google’s TPUs are application-specific chips, tuned for high-throughput matrix operations central to large language model training and inference. The current-generation TPU v5p features 95 gigabytes of HBM3 memory and a bfloat16 peak throughput of more than 450 TFLOPS per chip. TPU v5p pods can contain nearly 9,000 chips and are designed to scale efficiently inside Google Cloud’s infrastructure.
Crucially, Google owns the TPU architecture, instruction set, and software stack. Broadcom acts as Google's silicon implementation partner, converting Google’s architecture into a manufacturable ASIC layout. Broadcom also supplies high-speed SerDes, power management, packaging, and handles post-fabrication testing. Chip fabrication is performed by TSMC itself.
By contrast, Nvidia’s Hopper-based H100 GPU includes 80 billion transistors, 80 gigabytes of HBM3 memory, and delivers up to 4 PFLOPS of AI performance using FP8 precision. Its successor, the Blackwell-based GB200, increases HBM capacity to 192 gigabytes and peak throughput to around 20 PFLOPS. It’s also designed to work seamlessly in tandem with Grace CPUs in hybrid configurations, expanding Nvidia’s presence in both the cloud and emerging local compute nodes.
(Image credit: Nvidia)
TPUs are programmed via Google’s XLA compiler stack, which serves as the backend for frameworks like JAX and TensorFlow. While the XLA-based approach offers performance portability across CPU, GPU, and TPU targets, it typically requires model developers to adopt specific libraries and compilation patterns tailored to Google’s runtime environment.
... continue reading