NVIDIA Cosmos 3: Physical AI Research for Robotics and AVs

NVIDIA has launched Cosmos 3, which it describes as an open world foundation model for physical AI built on a mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction in a single system.
The company describes Cosmos 3 as “the world’s first fully open omnimodel” that can natively understand and generate text, images, video, ambient sound and actions with leading physics accuracy, reducing physical AI training times.
McKinsey says that robotics is ready to cross the gap from simulation to reality, adding that robots are now operating in dynamic, unpredictable settings where adaptability and autonomy are essential.
NVIDIA’s Cosmos 3
NVIDIA says Cosmos 3 enables robots, autonomous vehicles (AVs) or vision agents to generalise in the real world with limited training data and fragmented simulation stacks.
The model’s mixture-of-transformers architecture pairs a reasoning transformer with “an expert generation transformer”, enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories.
The Cosmos platform now includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety and spatial reasoning, as well as new physical AI agent skills for neural scene reconstruction, defect-image generation and video augmentation.
Deloitte says that with greater integration of AI capabilities in robotic systems and the emergence of specialised foundational models, robots can permeate multiple industries and applications, including smart factories.
The company predicts that cumulative installed capacity of industrial robots could reach 5.5 million by 2026, globally.
The big bang of physical AI
Jensen Huang, founder and CEO of NVIDIA, says: “The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.
“The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, AVs and vision AI that perceive, reason, plan and act in the physical world.”
NVIDIA says that part of its lineup, Cosmos 3 Super, is for post-training robotics and AV models that need the highest physics accuracy and generation quality.
How Cosmos works in Robotics
Cosmos 3 can further help generate synthetic data and scene variations, then support post-training with embodiment-specific behaviour and environment data for tasks ranging from pick-and-place to dexterous manipulation.
Developers can use Cosmos 3 as a vision language model, the backbone for world action models as well as using it as a world model or video foundation model that simulates physical environments and predicts future world states for training and evaluation.
Physical AI developers are already building on the Cosmos platform across industries. In robotics: Agile Robots, Doosan Robotics, LG Electronics, Samsung Electronics and Skild AI.
Li Auto is using the platform for AVs. Additionally, Centific, Fogsphere, Linker Vision, Milestone Systems and Yuan are using the platform for vision AI agents to power industrial AI and smart space applications.
The Cosmos Coalition
NVIDIA made the announcement of Cosmos 3 alongside NVIDIA Cosmos Coalition, which it described as a global collaboration between world model builders and AI developers, including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI.
NVIDIA says the coalition will advance open world models across industries, enabling members to contribute models, research and evaluation techniques while using Cosmos 3 technologies.

