Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what’s happening in their world, predict what’s likely to happen next, and generate actions for specific environments, embodiments, and tasks.
NVIDIA Cosmos 3 is a frontier foundation model for physical AI that combines physical reasoning, world generation, and action generation within a single open model.
NVIDIA is open sourcing Cosmos 3 models, training scripts, deployment tools, and datasets to make physical AI development more open and reproducible. This blog post covers the fundamentals of Cosmos 3, highlights key concepts from the technical report, guides through technical workflows, and shows how teams robotic manipulation systems, autonomous vehicles, and warehouse monitoring solutions can get started.
Figure 1. A clip of a video generated by Cosmos 3 for the autonomous driving domain
Figure 2. A video generated using Cosmos 3 for warehouse safety data.
Key highlights of this release include:
NVIDIA Cosmos 3 Nano and NVIDIA Cosmos 3 Super model checkpoints on Hugging Face with code on GitHub.
Open datasets for physical AI applications like robotics and autonomous driving.
Open post-training scripts for adapting Cosmos 3 to your domain.
... continue reading