Diffusion Quickstart Guide#
This page will walk you through setting up and running inference with the pre-trained Cosmos-Predict1-7B-Text2World diffusion model.
Set up the Diffusion Model#
Ensure you have the necessary hardware and software, as outlined on the Prerequisites page.
Follow the Installation guide to download the Cosmos-Predict1 repo and set up the conda environment.
Generate a Hugging Face access token. Set the access token permission to ‘Read’ (the default permission is ‘Fine-grained’).
Log in to Hugging Face with the access token:
huggingface-cli login
Download the model weights for Cosmos-Predict1-7B-Text2World from Hugging Face:
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 7B --model_types Text2World
Generate a Video from Text Input#
Generate a video with text input using the Cosmos-Predict1-7B-Text2World model. To do so, create a text prompt and pass it to the text2world.py
script:
PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \
The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \
The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/text2world.py \
--checkpoint_dir checkpoints \
--diffusion_transformer_dir Cosmos-Predict1-7B-Text2World \
--prompt "${PROMPT}" \
--offload_prompt_upsampler \
--video_save_name diffusion-text2world-7b
Note
You can also generate worlds from text and image/video input using the Cosmos Video2World diffusion models. Batch-generation is also available. Refer to the Diffusion Model Reference for model variants and options.
Next Steps#
Get started adapting a Diffusion model for your use case with the Diffusion Model Post-Training Guide or explore all Diffusion model input/output options in the Diffusion Model Reference.