Nvidia has announced a $20 billion deal to acquire Groq’s intellectual property. While it's not the company itself, Nvidia will absorb key members of its engineering team, including its ex-Google engineer founder, Jonathan Ross, and Groq president Sunny Madra, marking the company’s largest AI-related transaction since the Mellanox acquisition.
Nvidia’s purchase of Groq’s LPU IP focuses not on training — the space Nvidia already dominates — but inference, the computational process that turns AI models into real-time services. Groq’s core product is the LPU, or Language Processing Unit, a chip optimized to run large language models at ultra-low latency.
Where GPUs excel at large-batch parallelism, Groq’s statically scheduled architecture and SRAM-based memory design enable consistent performance for single-token inference workloads. That makes it particularly well-suited for applications like chatbot hosting and real-time agents, exactly the type of products that cloud vendors and startups are racing to scale.
The deal is structured as a non-exclusive license of Groq’s technology alongside a broad hiring initiative, allowing Nvidia to avoid triggering a full regulatory merger review while still acquiring de facto control over the startup’s roadmap. GroqCloud, the company’s public inference API, will continue to operate independently for now.
Buying time
Groq’s primary selling point is the simplicity of its architecture. Unlike general-purpose GPUs, the company’s chips use a single massive core and hundreds of megabytes of on-die SRAM. It has a static execution model, meaning the compiler pre-plans the entire program path and guarantees cycle-level determinism. The result of that is predictable latency with no cache misses or stalls.
In a benchmark of the 70B-parameter Llama 2 model, Groq’s LPU sustained 241 tokens per second, and internally, the company has reported even higher speeds on newer silicon. This throughput is achieved not by scaling up in batch size, but by optimizing for single-stream performance. That’s a fairly major distinction for any workloads that are dependent on real-time response rather than aggregate throughput.
Nvidia’s GPUs, including the upcoming Rubin series, rely on high-bandwidth external memory (GDDR7 or HBM3) and a highly parallel core layout. They scale efficiently for training and large-batch inference, but their performance drops at batch size one. Some of this can be mitigated by software optimization, but Groq’s approach avoids the problem entirely by eliminating external memory latency from the loop.
The acquisition grants Nvidia access to Groq’s entire hardware stack, encompassing the compiler toolchain and silicon design. More importantly, it brings in Groq’s engineering leadership, including founder Jonathan Ross, whose work on Google’s original TPU helped define the modern AI accelerator landscape. With this deal, Nvidia effectively compresses several years of inference-focused R&D into a single integration step.
Strategic containment
... continue reading