World Scenario Video Generation#

This page details how to generate world scenario videos from 3D scene annotations for use with Cosmos-Transfer2.5.

Additional Requirements#

In addition to the standard Transfer2.5 Prerequisites, you will need the following:

  • UV (for dependency management)

  • A GPU with EGL support (for headless OpenGL rendering)

  • 3D scene annotation data in Parquet format

Install Dependencies#

Use the following command to install dependencies:

cd packages/cosmos-transfer2
uv sync
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Generate Control Videos#

The following command will generate control videos (videos for all seven cameras are generated by default):

python scripts/generate_control_videos.py /path/to/{input_root} ./{save_root}

The following command will generate a control video for the front:wide:120fov and cross:right:120fov cameras:

python scripts/generate_control_videos.py {input_root}/ {save_root}/ \
    --cameras "camera:front:wide:120fov,camera:cross:right:120fov"

Command Options#

Option

Default

Description

--cameras

all

A comma-separated list of camera names, or “all” for all seven cameras

Available Cameras#

  • camera:front:wide:120fov

  • camera:front:tele:sat:30fov

  • camera:cross:right:120fov

  • camera:cross:left:120fov

  • camera:rear:left:70fov

  • camera:rear:right:70fov

  • camera:rear:tele:30fov

Data Format#

Input Structure#

scene_annotations_directory/
├── uuid.obstacle.parquet              (required)
├── uuid.calibration_estimate.parquet  (required)
├── uuid.egomotion_estimate.parquet    (required)
├── uuid.lane.parquet                  (optional)
├── uuid.lane_line.parquet             (optional)
└── ... (other optional parquet files)

Output Structure#

save_root/
└── uuid/
    ├── uuid.camera_front_wide_120fov.mp4
    ├── uuid.camera_front_tele_sat_30fov.mp4
    ├── uuid.camera_cross_right_120fov.mp4
    ├── uuid.camera_cross_left_120fov.mp4
    ├── uuid.camera_rear_left_70fov.mp4
    ├── uuid.camera_rear_right_70fov.mp4
    ├── uuid.camera_rear_tele_30fov.mp4

Rendered Elements#

The following elements are always rendered:

  • 3D bounding boxes for vehicles/pedestrians (from the required obstacle.parquet file)

The following elements are optionally rendered if the corresponding Parquet file is provided:

  • Lane lines, lanes, road boundaries

  • Crosswalks, poles, road markings, wait lines

  • Traffic lights and signs

Troubleshooting#

ModernGL/EGL errors

  • Install GPU drivers and EGL libraries (libGL.so.1, libEGL.so.1). On Ubuntu/Debian: apt install libegl1-mesa-dev libgl1-mesa-dri

Missing parquet files

  • Ensure the required files exist: obstacle.parquet, calibration_estimate.parquet, egomotion_estimate.parquet.

Memory issues

  • Process fewer cameras at once if needed.

Invalid camera names

  • Run with --help to see valid options.

Next Steps#

Generated control videos serve as conditioning inputs for Cosmos Transfer2.5 multiview inference. The HD map visualizations provide spatial context for video generation tasks.