Auto Multiview Inference Guide#

This page provides instructions for running inference with the Cosmos-Transfer2.5 Auto Multiview model.

Note

Ensure you have completed the steps in the Transfer2.5 Installation Guide before running inference.

Important

Multiview inference requires 8 GPUs.

Example Inference Command#

Use the following command to run multiview inference with the example asset:

export NUM_GPUS=8
torchrun --nproc_per_node=$NUM_GPUS --master_port=12341 -m examples.multiview --params_file assets/multiview_example/multiview_spec.json --num_gpus=$NUM_GPUS

End-to-End Multiview Example#

Follow these steps to perform multiview inference using 3D scene annotations. Scene annotations (object positions, camera calibration, vehicle trajectory) are rendered into world scenario videos that condition multiview generation. This example uses only rendered control videos, not raw footage.

  1. Download scene annotations:

    mkdir -p datasets && curl -Lf https://github.com/nvidia-cosmos/cosmos-dependencies/releases/download/assets/3d_scene_metadata.zip -o temp.zip && unzip temp.zip -d datasets && rm temp.zip
    
  2. Generate world scenario videos:

    # See world_scenario_video_generation.md for detailed instructions
    python scripts/generate_control_videos.py datasets/3d_scene_metadata assets/multiview_example1/world_scenario_videos
    

    Refer to World Scenario Video Generation guide for detailed instructions.

  3. Since this example does not use raw footage, set { "num_conditional_frames": 0 } in the parameter JSON file (in this case, assets/multiview_example/multiview_spec.json).

  4. Run multiview inference:

    export NUM_GPUS=8
    torchrun --nproc_per_node=$NUM_GPUS --master_port=12341 -m examples.multiview --params_file assets/multiview_example/multiview_spec.json --num_gpus=$NUM_GPUS