Transfer Quickstart Guide#

This page will walk you through setting up and running inference with the pre-trained Transfer model.

Set up Cosmos Transfer1#

Ensure you have the necessary hardware and software, as outlined on the Prerequisites page.
Follow the Installation guide to download the Cosmos-Transfer1 repo and set up the conda environment.
Generate a Hugging Face access token. Set the access token permission to ‘Read’ (the default permission is ‘Fine-grained’).
Log in to Hugging Face with the access token:
```
huggingface-cli login
```
Accept the LlamaGuard-7b terms

Download the model weights for Cosmos-Predict1-7B-Text2World from Hugging Face:

CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 7B --model_types Text2World

Note

The model weights require about 300GB of free storage. Not all checkpoints will be used in every generation.

The downloaded files will be in the following structure:

checkpoints/
├── nvidia
│   ├── Cosmos-Transfer1-7B
│   │   ├── base_model.pt
│   │   ├── vis_control.pt
│   │   ├── edge_control.pt
│   │   ├── seg_control.pt
│   │   ├── depth_control.pt
│   │   ├── 4kupscaler_control.pt
│   │   ├── config.json
│   │   └── guardrail
│   │       ├── aegis/
│   │       ├── blocklist/
│   │       ├── face_blur_filter/
│   │       └── video_content_safety_filter/
│   │
│   ├── Cosmos-Transfer1-7B-Sample-AV/
│   │   ├── base_model.pt
│   │   ├── hdmap_control.pt
│   │   └── lidar_control.pt
│   │
│   └── Cosmos-Tokenize1-CV8x8x8-720p
│       ├── decoder.jit
│       ├── encoder.jit
│       ├── autoencoder.jit
│       └── mean_std.pt
│
├── depth-anything/...
├── facebook/...
├── google-t5/...
└── IDEA-Research/

Generate a Visual Simulation from Source Video#

Use the Cosmos-Transfer1-7B model to generate a high-quality visual simulation from a low-resolution edge-detect source video. To do so, run the following command:

export CUDA_VISIBLE_DEVICES=0
export CHECKPOINT_DIR="${CHECKPOINT_DIR:=./checkpoints}"
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_transfer1/diffusion/inference/transfer.py \
    --checkpoint_dir $CHECKPOINT_DIR \
    --video_save_folder outputs/example1_single_control_edge \
    --controlnet_specs assets/inference_cosmos_transfer1_single_control_edge.json \
    --offload_text_encoder_model

The --controlnet_specs argument specifies the path to the JSON file that contains transfer specifications. In this case, the inference_cosmos_transfer1_single_control_edge.json file contains the following configuration:

{
    "prompt": "The video is set in a modern, well-lit office environment with a sleek, minimalist design. ...",
    "input_video_path" : "assets/example1_input_video.mp4",
    "edge": {
        "control_weight": 1.0
    }
}

This is the low-resolution (640x480) edge-detect source video:

This is the video (960x704) generated by the Cosmos-Transfer1-7B model: