Supported Models#

Overview#

Cosmos-Predict1-7B models are pre-trained, generative models designed for Physical AI. They are available in two variants:

Cosmos-Predict1-7B-Text2World: A text-to-video model that generates video content from textual descriptions.
Cosmos-Predict1-7B-Video2World: A model that generates video from images or other videos, with optional text input.

Learn more about Cosmos through these resources:

Cosmos models are available for commercial use under the NVIDIA Open Model license agreement. For more details, refer to the EULA.

Note

Cosmos models leverage a sophisticated optimization stack that includes PyTorch, NVIDIA NeMo, NVIDIA TensorRT, and NVIDIA TensorRT-LLM for hardware-accelerated inference and deployment.

Throughput-optimized Configurations#

NVIDIA NIM for Cosmos requires NVIDIA GPUs with Ampere architecture or later. The following configurations are optimized using NVIDIA TensorRT (TRT) and NVIDIA TensorRT-LLM (TRT-LLM) for specific GPU models:

Cosmos-Predict1-7B-Text2World

GPU	GPU Memory (GB)	Precision	Number of GPUs
H200	141	FP8	1, 2, 4, 8
H100 SXM	80	FP8	1, 2, 4, 8
H100 NVL	94	FP8	1, 2, 4, 8
H100 PCIe	80	FP8	1, 2, 4, 8

Cosmos-Predict1-7B-Video2World

GPU	GPU Memory (GB)	Precision	Number of GPUs
H200	141	FP8	1, 2, 4, 8
H100 SXM	80	FP8	1, 2, 4, 8
H100 NVL	94	FP8	1, 2, 4, 8
H100 PCIe	80	FP8	1, 2, 4, 8

Warning

Deploying on configurations not listed above may result in suboptimal performance.

Fallback Configurations#

Other GPU models in the Ampere and Ada generations are supported via the fallback method (latency profile) only, with these requirements:

Combined VRAM across all GPUs must exceed 100GB
Minimum single-GPU VRAM must be at least 48GB

Deployment Profiles#

Profile Selection#

NIM automatically selects the optimal profile based on the number of GPUs exposed to the container (via the --gpus flag). You can override this selection using the NIM_MODEL_PROFILE environment variable.

Two profile types are available:

Latency Profiles#

Latency profiles use a parallelized diffusion component with context-parallel (CP) distribution across exposed GPUs, reducing inference time nearly linearly with GPU count.

Four configurations are available:

CP=8: Used when 8 GPUs are available
CP=4: Used when 4-7 GPUs are available
CP=2: Used when 2-3 GPUs are available
CP=1: Used when 1 GPU is available

Performance example: With an H100 SXM GPU, generating 121 frames takes approximately 5 minutes using CP1, but approximately 1 minute with CP8.

Tip

For optimal resource utilization, match the number of GPUs exactly to profile requirements (8, 4, 2, or 1). For example, with 7 GPUs, only 4 will be utilized (CP4 profile), leaving 3 GPUs idle.

Throughput Profiles#

Throughput profiles use quantized and TRT-accelerated diffusion components replicated across GPUs, enabling concurrent requests and increasing overall system throughput nearly linearly with GPU count.

Note

Throughput profiles are only available on H200, H100 SXM, H100 PCIe, and H100 NVL GPUs.

Tip

TRT-accelerated profiles run diffusion with lower precisions. Throughput is increased up to double that of the latency profile (with 8x H100 SXM), but they may produce visual artifacts that are not observed with the latency profile.

By default, NIM prioritizes latency profiles. To change this behavior, set NIM_MODEL_PROFILE=throughput or NIM_MODEL_PROFILE=latency, or specify a particular profile ID.

For detailed configuration options, refer to the Configuring a NIM page.

Supported Codecs#

Output video#

Output videos encoded in the b64_video output field always use the .mp4 container and VP9 codec.

Input video#

For Cosmos-Predict1-7B-Video2World, any video container supported by ffmpeg native demuxers <https://ffmpeg.org/ffmpeg-formats.html#Demuxers> is supported.

Depending on GPU type, supported codecs will be a subset of the following:

VP9
VP8
VC1
MPEG-1
MPEG-2
H.264
H.265 (HEVC)
AV1

Refer to the Video Decode GPU Support Matrix (NVDEC) for details concerning your platform.