Skip to content
Tech News
← Back to articles

ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters

read original get AI Model Comparison Chart → more articles
Why This Matters

The ZAYA1-8B model's ability to match leading AI benchmarks using less than 1 billion active parameters and trained exclusively on AMD hardware signifies a major shift in AI development. It demonstrates that high-performance, frontier-level models can be built without relying on NVIDIA's dominant ecosystem, opening new avenues for cost-effective and diverse infrastructure options in the industry.

Key Takeaways

- Advertisement -

Zyphra just dropped a model that’s doing something most people will scroll past without understanding why it’s interesting.

ZAYA1-8B matches DeepSeek-R1 on math benchmarks. Stays competitive with Claude Sonnet 4.5 on reasoning. Closes in on Gemini 2.5 Pro on coding. These are frontier model comparisons, the kind of numbers that usually come with billions of parameters and serious hardware requirements.

This one runs on less than 1 billion active parameters. And it was trained entirely on AMD hardware, which almost no serious model can say.

Built on AMD Instead of NVIDIA.

Every model you’ve heard of was trained on NVIDIA hardware mostly on H100s, A100s, GB200s. The entire open source AI ecosystem has been built on a de facto NVIDIA monopoly and most labs don’t even mention the hardware because there’s nothing to mention, it’s always NVIDIA.

Zyphra trained ZAYA1-8B end to end on AMD Instinct MI300X GPUs. Pretraining, midtraining, supervised fine-tuning, all of it on a 1,024 node AMD cluster built with IBM using AMD Pensando Pollara interconnect.

That detail matters for two reasons. First it proves the AMD stack can produce frontier-competitive results at this scale, which matters for anyone thinking about infrastructure that isn’t locked into NVIDIA pricing.

Second it means Zyphra had to solve real engineering problems that most labs never encounter because they default to CUDA. The fact that the model performs this well coming off that stack says something about both the hardware and the team. It’s a proof of concept for an alternative path that the industry needs.

Less than 1B active parameters.

... continue reading