World Scenario Video Generation#
This tool generates control videos from 3D scene annotations for Cosmos-Transfer2.5. It renders world models into videos by projecting 3D elements (polylines, polygons, and cuboids) onto camera views.
Supported input formats:
Parquet format: Structured scene annotations in parquet files
RDS-HQ format: NVIDIA’s internal format from the Cosmos-Drive-Dreams dataset
Additional Requirements#
In addition to the standard Transfer2.5 Prerequisites, you will need the following:
Python 3.10+
UV (for dependency management)
A GPU with EGL support (for headless OpenGL rendering)
3D scene annotation data in Parquet or RDS-HQ format
Install Dependencies#
Use the following command to install dependencies:
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Generate Control Videos#
The script automatically detects whether your input is in Parquet or RDS-HQ format.
The following command generates control videos (all seven cameras by default):
python scripts/generate_control_videos.py -i /path/to/{input_root} -o ./{save_root}
The following command generates control videos for specific cameras only:
python scripts/generate_control_videos.py -i {input_root}/ -o {save_root}/ \
--cameras "camera:front:wide:120fov,camera:cross:right:120fov"
Command Options#
Option |
Default |
Description |
|---|---|---|
|
|
A comma-separated list of camera names, or “all” for all seven cameras |
Available Cameras#
camera:front:wide:120fovcamera:front:tele:sat:30fovcamera:cross:right:120fovcamera:cross:left:120fovcamera:rear:left:70fovcamera:rear:right:70fovcamera:rear:tele:30fov
Complete Example#
The following end-to-end example uses Parquet input data:
# Download example data
wget -P assets https://github.com/nvidia-cosmos/cosmos-dependencies/releases/download/assets/multiview_example1.zip && unzip -oq assets/multiview_example1.zip -d assets
# Generate control videos for the example scene
python scripts/generate_control_videos.py -i assets/multiview_example1/scene_annotations -o outputs/multiview_example1_world_scenario_videos
Additional example datasets:
wget https://github.com/nvidia-cosmos/cosmos-dependencies/releases/download/assets/multiview_example2.zip
wget https://github.com/nvidia-cosmos/cosmos-dependencies/releases/download/assets/multiview_example3.zip
RDS-HQ Example#
To use data from the Cosmos-Drive-Dreams dataset in RDS-HQ format:
wget -P scripts https://raw.githubusercontent.com/nv-tlabs/Cosmos-Drive-Dreams/main/scripts/download.py
python scripts/download.py --odir ./assets/rdshq-data --limit 1
python scripts/generate_control_videos.py -i assets/rdshq-data -o outputs/rdshq-generated
Data Format#
Input Structure#
Parquet format:
scene_annotations_directory/
├── uuid.obstacle.parquet (required)
├── uuid.calibration_estimate.parquet (required)
├── uuid.egomotion_estimate.parquet (required)
├── uuid.lane.parquet (optional)
├── uuid.lane_line.parquet (optional)
└── ... (other optional parquet files)
RDS-HQ format: NVIDIA’s recording format containing sensor data and annotations. The script automatically extracts the required scene information.
Output Structure#
Both input formats produce the same output structure:
save_root/
└── uuid/
├── uuid.camera_front_wide_120fov.mp4
├── uuid.camera_front_tele_sat_30fov.mp4
├── uuid.camera_cross_right_120fov.mp4
├── uuid.camera_cross_left_120fov.mp4
├── uuid.camera_rear_left_70fov.mp4
├── uuid.camera_rear_right_70fov.mp4
├── uuid.camera_rear_tele_30fov.mp4
Rendered Elements#
The following elements are always rendered:
3D bounding boxes for vehicles, pedestrians, and other dynamic objects (from the required
obstacle.parquetfile)
The following elements are optionally rendered if the corresponding Parquet file is provided:
Lane lines, lanes, and road boundaries
Crosswalks, road markings, and wait lines
Poles, traffic lights, and traffic signs
Troubleshooting#
Issue |
Solution |
|---|---|
ModernGL/EGL errors |
Install GPU drivers and EGL libraries ( |
Missing parquet files |
Ensure the required files exist: |
Memory issues |
Reduce the number of cameras processed simultaneously using |
Invalid camera names |
Run with |
Rendering Specifications#
Dynamic Objects#
Dynamic objects are rendered as solid 3D cuboids with light gray edges and front-to-back color gradients.
Object label mapping covers five categories:
Car: automobile, other_vehicle, vehicle
Truck: heavy_truck, bus, train_or_tram_car, trailer
Pedestrian: person
Cyclist: rider
Others: protruding_object, animal, stroller
Lane Lines#
Lane lines are categorized into 15 types based on color (yellow, white, other) and style (solid, dashed, dotted, solid-dashed combinations). For example, yellow solid dashed means a yellow solid line (right) and yellow dashed line (left) in the polyline direction.
Traffic Lights#
Traffic lights are rendered as cuboids with four states: Red, Yellow, Green, Unknown.
Map Elements#
Map elements use three geometry types:
Polylines: poles, road boundaries, wait lines
Polygons: crosswalks, road markings
Cuboids: traffic signs
Pipeline Overview#
Frame Rate Configuration#
The pipeline uses two configurable frame rates:
INPUT_POSE_FPS(default: 30fps): Processing frame rate for interpolation — determines how many frames are generatedTARGET_RENDER_FPS(default: 30fps): Output video playback frame rate — determines playback speed
Source data is typically at 10Hz and is interpolated to the processing frame rate.
Processing Steps#
Load camera calibration
Parse and interpolate egomotion trajectory to processing frame rate
Interpolate obstacle tracks to match egomotion timestamps
Transform all geometries from world to camera coordinates
Render each frame using OpenGL
Encode output as MP4 video
Coordinate Systems#
World coordinates: Right-handed system (x=forward, y=left, z=up)
Camera coordinates: Camera looks along the positive z-axis; x-axis is right, y-axis is down
FLU convention: Forward-Left-Up used for vehicle-to-camera transforms
Next Steps#
Generated control videos serve as conditioning inputs for Cosmos Transfer2.5 multiview inference. The HD map visualizations provide spatial context for video generation tasks.