Nvidia Nemotron 3 Family of Models

Models Nemotron 3 White Paper Nano Tech Report

We announce NVIDIA Nemotron 3, the most efficient family of open models with leading accuracy for agentic AI applications. The Nemotron 3 family consists of three models: Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities.

Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance.

We are releasing the Nemotron 3 Nano model and technical report. Super and Ultra releases will follow in the coming months.

Nemotron 3 technologies

Hybrid MoE : Nemotron 3 family of models utilize a hybrid Mamba-Transformer MoE architecture to provide best-in-class throughput while having better or on-par accuracy than standard Transformers.

: Nemotron 3 family of models utilize a hybrid Mamba-Transformer MoE architecture to provide best-in-class throughput while having better or on-par accuracy than standard Transformers. LatentMoE : Super and Ultra utilize Latent MoE, a novel hardware-aware expert design for improved accuracy.

: Super and Ultra utilize Latent MoE, a novel hardware-aware expert design for improved accuracy. Multi-Token Prediction : Super and Ultra incorporate MTP layers for improved long-form text generation efficiency and better model quality.

: Super and Ultra incorporate MTP layers for improved long-form text generation efficiency and better model quality. NVFP4 : Super and Ultra are trained with NVFP4.

: Super and Ultra are trained with NVFP4. Long Context : Nemotron 3 models support context length up to 1M tokens.

... continue reading