Predict2 Quickstart Guide#
This page will walk you through setting up and running inference with the pre-trained Cosmos-Predict2-2B-Video2World model.
Set up the Video2World Model#
Ensure you have the necessary hardware and software, as outlined on the Prerequisites page.
Follow the Installation guide to download the Cosmos-Predict2 repo and set up the conda environment.
Generate a Hugging Face access token. Set the access token permission to ‘Read’ (the default permission is ‘Fine-grained’).
Log in to Hugging Face with the access token:
huggingface-cli login
Review and accept the LlamaGuard-7b terms
Download the model weights for Cosmos-Predict2-2B-Video2World from Hugging Face:
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 2B --model_types Video2World --checkpoint_dir checkpoints
Generate a Video from Text Input#
Generate a video from text and image input using the Cosmos-Predict2-2B-Text2World model. To do so, create a text prompt and pass it, along with the image
to the text2world.py
script:
Note
The sample text and image inputs used below are provided as the input0.txt
/input0.jpg
files in the
asset/video2world directory.
PROMPT="A nighttime city bus terminal gradually shifts from stillness to subtle movement. At first, multiple \
double-decker buses are parked under the glow of overhead lights, with a central bus labeled “87D” facing \
forward and stationary. As the video progresses, the bus in the middle moves ahead slowly, its headlights \
brightening the surrounding area and casting reflections onto adjacent vehicles. The motion creates space in \
the lineup, signaling activity within the otherwise quiet station. It then comes to a smooth stop, resuming its \
position in line. Overhead signage in Chinese characters remains illuminated, enhancing the vibrant, urban \
night scene."
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict2/diffusion/inference/video2world.py \
--checkpoint_dir checkpoints \
--input_image_or_video_path assets/video2world/input0.jpg \
--num_input_frames 1 \
--diffusion_transformer_dir Cosmos-Predict2-2B-Video2World \
--offload_prompt_upsampler \
--disable_prompt_upsampler \
--prompt "${PROMPT}" \
--height 432 --width 768 --num_video_frames 81 \
--num_steps 35 \
--video_save_name video2world_2b
The inference output will be saved as outputs/video2world_2b.mp4
, along with the corresponding prompt at outputs/video2world_2b.txt
.
Next Steps#
Follow the [Post-Training Guide] to post-train a Predict2 model for your physical AI use case or explore all Predict model input/output options in the Transfer1 Model Reference.