Gemma 3 QAT Models: Bringing AI to Consumer GPUs
Published on: 2025-04-25 02:22:06
To make Gemma 3 even more accessible, we are announcing new versions optimized with Quantization-Aware Training (QAT) that dramatically reduces memory requirements while maintaining high quality. This enables you to run powerful models like Gemma 3 27B locally on consumer-grade GPUs like the NVIDIA RTX 3090.
Last month, we launched Gemma 3, our latest generation of open models. Delivering state-of-the-art performance, Gemma 3 quickly established itself as a leading model capable of running on a single high-end GPU like the NVIDIA H100 using its native BFloat16 (BF16) precision.
Understanding performance, precision, and quantization
The chart above shows the performance (Elo score) of recently released large language models. Higher bars mean better performance in comparisons as rated by humans viewing side-by-side responses from two anonymous models. Below each bar, we indicate the estimated number of NVIDIA H100 GPUs needed to run that model using the BF16 data type.
Why BFloat16
... Read full article.