Transfer Quickstart Guide#
This page will walk you through setting up and running inference with the pre-trained Transfer model.
Set up Cosmos Transfer1#
Ensure you have the necessary hardware and software, as outlined on the Prerequisites page.
Follow the Installation guide to download the Cosmos-Transfer1 repo and set up the conda environment.
Generate a Hugging Face access token. Set the access token permission to ‘Read’ (the default permission is ‘Fine-grained’).
Log in to Hugging Face with the access token:
huggingface-cli login
Accept the LlamaGuard-7b terms
Download the model weights for Cosmos-Predict1-7B-Text2World from Hugging Face:
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 7B --model_types Text2World
Note
The model weights require about 300GB of free storage. Not all checkpoints will be used in every generation.
The downloaded files will be in the following structure:
checkpoints/ ├── nvidia │ ├── Cosmos-Transfer1-7B │ │ ├── base_model.pt │ │ ├── vis_control.pt │ │ ├── edge_control.pt │ │ ├── seg_control.pt │ │ ├── depth_control.pt │ │ ├── 4kupscaler_control.pt │ │ ├── config.json │ │ └── guardrail │ │ ├── aegis/ │ │ ├── blocklist/ │ │ ├── face_blur_filter/ │ │ └── video_content_safety_filter/ │ │ │ ├── Cosmos-Transfer1-7B-Sample-AV/ │ │ ├── base_model.pt │ │ ├── hdmap_control.pt │ │ └── lidar_control.pt │ │ │ └── Cosmos-Tokenize1-CV8x8x8-720p │ ├── decoder.jit │ ├── encoder.jit │ ├── autoencoder.jit │ └── mean_std.pt │ ├── depth-anything/... ├── facebook/... ├── google-t5/... └── IDEA-Research/
Generate a Visual Simulation from Source Video#
Use the Cosmos-Transfer1-7B model to generate a high-quality visual simulation from a low-resolution edge-detect source video. To do so, run the following command:
export CUDA_VISIBLE_DEVICES=0
export CHECKPOINT_DIR="${CHECKPOINT_DIR:=./checkpoints}"
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_transfer1/diffusion/inference/transfer.py \
--checkpoint_dir $CHECKPOINT_DIR \
--video_save_folder outputs/example1_single_control_edge \
--controlnet_specs assets/inference_cosmos_transfer1_single_control_edge.json \
--offload_text_encoder_model
The --controlnet_specs
argument specifies the path to the JSON file that contains transfer specifications. In this case,
the inference_cosmos_transfer1_single_control_edge.json
file contains the following configuration:
{
"prompt": "The video is set in a modern, well-lit office environment with a sleek, minimalist design. ...",
"input_video_path" : "assets/example1_input_video.mp4",
"edge": {
"control_weight": 1.0
}
}
This is the low-resolution (640x480) edge-detect source video:
This is the video (960x704) generated by the Cosmos-Transfer1-7B model: