Transfer Quickstart Guide#

This page will walk you through setting up and running inference with the pre-trained Transfer model.

Set up Cosmos Transfer1#

  1. Ensure you have the necessary hardware and software, as outlined on the Prerequisites page.

  2. Follow the Installation guide to download the Cosmos-Transfer1 repo and set up the conda environment.

  3. Generate a Hugging Face access token. Set the access token permission to ‘Read’ (the default permission is ‘Fine-grained’).

  4. Log in to Hugging Face with the access token:

    huggingface-cli login
    
  5. Accept the LlamaGuard-7b terms

  6. Download the model weights for Cosmos-Predict1-7B-Text2World from Hugging Face:

    CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 7B --model_types Text2World
    

    Note

    The model weights require about 300GB of free storage. Not all checkpoints will be used in every generation.

    The downloaded files will be in the following structure:

    checkpoints/
    ├── nvidia
    │   ├── Cosmos-Transfer1-7B
    │      ├── base_model.pt
    │      ├── vis_control.pt
    │      ├── edge_control.pt
    │      ├── seg_control.pt
    │      ├── depth_control.pt
    │      ├── 4kupscaler_control.pt
    │      ├── config.json
    │      └── guardrail
    │          ├── aegis/
    │          ├── blocklist/
    │          ├── face_blur_filter/
    │          └── video_content_safety_filter/
    │   │
    │   ├── Cosmos-Transfer1-7B-Sample-AV/
    │      ├── base_model.pt
    │      ├── hdmap_control.pt
    │      └── lidar_control.pt
    │   │
    │   └── Cosmos-Tokenize1-CV8x8x8-720p
    │       ├── decoder.jit
    │       ├── encoder.jit
    │       ├── autoencoder.jit
    │       └── mean_std.pt
    │
    ├── depth-anything/...
    ├── facebook/...
    ├── google-t5/...
    └── IDEA-Research/
    

Generate a Visual Simulation from Source Video#

Use the Cosmos-Transfer1-7B model to generate a high-quality visual simulation from a low-resolution edge-detect source video. To do so, run the following command:

export CUDA_VISIBLE_DEVICES=0
export CHECKPOINT_DIR="${CHECKPOINT_DIR:=./checkpoints}"
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_transfer1/diffusion/inference/transfer.py \
    --checkpoint_dir $CHECKPOINT_DIR \
    --video_save_folder outputs/example1_single_control_edge \
    --controlnet_specs assets/inference_cosmos_transfer1_single_control_edge.json \
    --offload_text_encoder_model

The --controlnet_specs argument specifies the path to the JSON file that contains transfer specifications. In this case, the inference_cosmos_transfer1_single_control_edge.json file contains the following configuration:

{
    "prompt": "The video is set in a modern, well-lit office environment with a sleek, minimalist design. ...",
    "input_video_path" : "assets/example1_input_video.mp4",
    "edge": {
        "control_weight": 1.0
    }
}

This is the low-resolution (640x480) edge-detect source video:

This is the video (960x704) generated by the Cosmos-Transfer1-7B model: