Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Small models are having a moment. On the heels of the release of a new AI vision model small enough to fit on a smartwatch from MIT spinoff Liquid AI, and a model small enough to run on a smartphone from Google, Nvidia is joining the party today with a new small language model (SLM) of its own, Nemotron-Nano-9B-V2, which attained the highest performance in its class on selected benchmarks and comes with the ability for users to toggle on and off AI “reasoning,” that is, self-checking before outputting an answer.
While the 9 billion parameters are larger than some of the multimillion parameter small models VentureBeat has covered recently, Nvidia notes it is a meaningful reduction from its original size of 12 billion parameters and is designed to fit on a single Nvidia A10 GPU.
As Oleksii Kuchiaev, Nvidia Director of AI Model Post-Training, said on X in response to a question I submitted to him: “The 12B was pruned to 9B to specifically fit A10 which is a popular GPU choice for deployment. It is also a hybrid model which allows it to process a larger batch size and be up to 6x faster than similar sized transformer models.”
For context, many leading LLMs are in the 70+ billion parameter range (recall parameters refer to the internal settings governing the model’s behavior, with more generally denoting a larger and more capable, yet more compute intensive model).
AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems Secure your spot to stay ahead: https://bit.ly/4mwGngO
The model handles multiple languages, including English, German, Spanish, French, Italian, Japanese, and in extended descriptions, Korean, Portuguese, Russian, and Chinese. It’s suitable for both instruction following and code generation.
Nemotron-Nano-9B-V2 and its pre-training datasets available right now on Hugging Face and through the company’s model catalog.
... continue reading