NVIDIA releases detail sampling controls for Cosmos world foundation models
NVIDIA has released detailed sampling parameters for its NIM for Cosmos WFM (World Foundation Models), offering fine-grained control over AI-generated video. These parameters cover models like `text2world`, `video2world`, `transfer2.5-2b`, and `cosmos3-generator`, allowing users to adjust factors such as resolution, frame rate, and adherence to prompts. This technical documentation helps professionals leverage generative AI video for specific output requirements.
Key Takeaways
- NIM for Cosmos WFM supports five aspect ratios including 16:9, 9:16, 4:3, 3:4, and 1:1 square formats.
- Cosmos3-Generator handles both Text2Video and Image2Video with user-adjustable FPS ranges from 12 to 40.
- The cosmos-transfer2.5-2b model supports video-to-video style transfers with raw, VP9, HEVC, and AV1 codec inputs.
- A guidance_scale parameter (1.0 to 10.0) allows developers to trade off between strict prompt adherence and creative diversity.
- New prompt_upsampling feature enhances text processing for better model understanding before generation.
Why It Matters
NVIDIA is shifting focus from raw generation to professional-grade control interfaces for world simulation. By exposing granular parameters like motion thresholds and 4k+1 frame cadence, the company provides the technical tooling necessary for B2B applications in robotics and autonomous systems where physics accuracy is paramount. This move positions NVIDIA as an infrastructure layer for synthetic data generation, moving beyond the 'black box' approach seen in consumer video AI. Competitively, it bridges the gap between creative prompt engineering and industrial-grade simulation. Look for integration into higher-level robot training pipelines and cinematic pre-visualization workflows as these NIMs move toward broad enterprise adoption.
Additional Context
The commercial release of sampling controls for Cosmos WFM follows the late May 2026 debut of NVIDIA Cosmos 3 at GTC Taipei. CEO Jensen Huang characterized the platform as a 'generational leap' for physical AI, noting it was trained on 20 trillion multimodal tokens. Unlike previous iterations, Cosmos 3 utilizes a mixture-of-transformers architecture, pairing reasoning transformers with expert generation transformers to improve physics accuracy in synthetic environments. Per NVIDIA reporting from June 2026, the system is designed to reduce training cycles for autonomous vehicles and robotics from months to days by providing highly controllable temporal consistency. Simultaneous with the model updates, NVIDIA announced the Cosmos Coalition in June 2026. This partnership includes world-model builders such as Black Forest Labs and Runway, alongside robotics labs like Skild AI and Agile Robots. Per GamesBeat in May 2026, the coalition aims to establish open standards for how foundation models interact with physical hardware. These developments coincide with a broader shift in the video AI market; while OpenAI deprecated its original Sora API in April 2026, it redirected focus toward cinematic production, forcing infrastructure providers like NVIDIA to double down on the 'control surface' of their generative tools. On the hardware front, the launch is optimized for the Vera Rubin architecture, which NVIDIA introduced in early June 2026 as its first 'agentic' superchip. Per SiliconANGLE from June 2026, the transition to agent-based computing requires models to not only generate pixels but also predict action trajectories and sound. NVIDIA's simultaneous release of Cosmos3-Generator and environmental audio-capable 'omnimodels' reflects this strategy, aiming to provide a unified stack for what Huang calls 'AI factories'—large-scale infrastructure projects currently exceeding $50 billion in capital costs per gigawatt.
Read full article at docs.nvidia.com
