Auto Multiview Inference Guide#

This page provides instructions for running inference with the Cosmos-Transfer2.5 Auto Multiview model.

Note

Ensure you have completed the steps in the Transfer2.5 Installation Guide before running inference.

Important

Multiview inference requires 8 GPUs by default. The number of GPUs must be greater than or equal to the number of active views in your spec (an active view is any camera entry that supplies a control_path). The default spec enables seven views. If you reduce the views in your JSON spec, you can run on fewer GPUs by adjusting --nproc_per_node accordingly.

Example Inference Command#

Use the following command to run multiview inference with the example asset:

torchrun --nproc_per_node=8 --master_port=12341 examples/multiview.py -i assets/multiview_example/multiview_spec.json -o outputs/multiview/

By default, the output is a single concatenated video containing all views side-by-side. Set "save_combined_views": false in the params JSON to instead save individual MP4 files for each camera view, plus a 3×3 tiled grid video combining all views.

For an explanation of all available parameters, run:

python examples/multiview.py --help

python examples/multiview.py control:view-config --help  # for information specific to view configuration

To generate longer videos using autoregressive mode:

torchrun --nproc_per_node=8 --master_port=12341 -m examples.multiview -i assets/multiview_example/multiview_autoregressive_spec.json -o outputs/multiview_autoregressive

End-to-End Multiview Example#

Follow these steps to perform multiview inference using 3D scene annotations. Scene annotations (object positions, camera calibration, vehicle trajectory) are rendered into world scenario videos that condition multiview generation. This example uses only rendered control videos, not raw footage.

  1. Download example data:

    wget -P assets https://github.com/nvidia-cosmos/cosmos-dependencies/releases/download/assets/multiview_example1.zip && unzip -oq assets/multiview_example1.zip -d assets
    
  2. Generate world scenario videos:

    # See the World Scenario Video Generation guide for detailed instructions
    python scripts/generate_control_videos.py -i assets/multiview_example1/scene_annotations -o outputs/multiview_example1_world_scenario_videos
    

    Refer to the World Scenario Video Generation guide for detailed instructions.

  3. Run multiview inference. Since this example does not use raw footage, set { "num_conditional_frames": 0 } in the parameter JSON file (in this case, assets/multiview_example1/multiview_spec.json) before running:

    torchrun --nproc_per_node=8 --master_port=12341 -m examples.multiview -i assets/multiview_example1/multiview_spec.json -o outputs/multiview_e2w/