Auto Multiview Inference Guide#
This page provides instructions for running inference with the Cosmos-Transfer2.5 Auto Multiview model.
Note
Ensure you have completed the steps in the Transfer2.5 Installation Guide before running inference.
Important
Multiview inference requires 8 GPUs by default. The number of GPUs must be greater than or equal to the number of active views in your spec (an active view is any camera entry that supplies a control_path). The default spec enables seven views. If you reduce the views in your JSON spec, you can run on fewer GPUs by adjusting --nproc_per_node accordingly.
Example Inference Command#
Use the following command to run multiview inference with the example asset:
torchrun --nproc_per_node=8 --master_port=12341 examples/multiview.py -i assets/multiview_example/multiview_spec.json -o outputs/multiview/
By default, the output is a single concatenated video containing all views side-by-side. Set "save_combined_views": false in the params JSON to instead save individual MP4 files for each camera view, plus a 3×3 tiled grid video combining all views.
For an explanation of all available parameters, run:
python examples/multiview.py --help
python examples/multiview.py control:view-config --help # for information specific to view configuration
To generate longer videos using autoregressive mode:
torchrun --nproc_per_node=8 --master_port=12341 -m examples.multiview -i assets/multiview_example/multiview_autoregressive_spec.json -o outputs/multiview_autoregressive
End-to-End Multiview Example#
Follow these steps to perform multiview inference using 3D scene annotations. Scene annotations (object positions, camera calibration, vehicle trajectory) are rendered into world scenario videos that condition multiview generation. This example uses only rendered control videos, not raw footage.
Download example data:
wget -P assets https://github.com/nvidia-cosmos/cosmos-dependencies/releases/download/assets/multiview_example1.zip && unzip -oq assets/multiview_example1.zip -d assets
Generate world scenario videos:
# See the World Scenario Video Generation guide for detailed instructions python scripts/generate_control_videos.py -i assets/multiview_example1/scene_annotations -o outputs/multiview_example1_world_scenario_videos
Refer to the World Scenario Video Generation guide for detailed instructions.
Run multiview inference. Since this example does not use raw footage, set
{ "num_conditional_frames": 0 }in the parameter JSON file (in this case,assets/multiview_example1/multiview_spec.json) before running:torchrun --nproc_per_node=8 --master_port=12341 -m examples.multiview -i assets/multiview_example1/multiview_spec.json -o outputs/multiview_e2w/