Video2World Post-training with DreamGen Bench#

This guide provides instructions on running post-training using robotic training datasets from the DreamGen paper.

Preparing Data#

Download DreamGen Bench Training Dataset#

For training on the robotic training datasets from the DreamGen paper, use the following command to download the GR1 training dataset from https://huggingface.co/datasets/nvidia/GR1-100.

# This command will download the videos for physical AI

hf download nvidia/GR1-100 --repo-type dataset --local-dir datasets/benchmark_train/hf_gr1/ && \
mkdir -p datasets/benchmark_train/gr1/videos && \
mv datasets/benchmark_train/hf_gr1/gr1/*mp4 datasets/benchmark_train/gr1/videos && \
mv datasets/benchmark_train/hf_gr1/metadata.csv datasets/benchmark_train/gr1/

Preprocess Data and Verify the Dataset Folder Format#

Run the following command to create text prompt txt files for each video:

python -m scripts.create_prompts_for_gr1_dataset --dataset_path datasets/benchmark_train/gr1

Dataset folder format should be as follows:

datasets/benchmark_train/gr1/
├── metas/
│   ├── *.txt
├── videos/
│   ├── *.mp4
├── metadata.csv

Post-training#

Run the following command to execute an example post-training job with GR1 data.

torchrun --nproc_per_node=1 --master_port=12341 -m scripts.train --config=cosmos_predict2/_src/predict2/configs/video2world/config.py -- experiment=predict2_video2world_training_2b_groot_gr1_480

The model will be post-trained using the GR1 dataset. Refer to the config predict2_video2world_training_2b_groot_gr1_480 (../cosmos_predict2/experiments/base/groot.py) to understand how the dataloader is defined.

Checkpoints are saved to ${IMAGINAIRE_OUTPUT_ROOT}/PROJECT/GROUP/NAME/checkpoints. By default, IMAGINAIRE_OUTPUT_ROOT is /tmp/imaginaire4-output. We strongly recommend setting IMAGINAIRE_OUTPUT_ROOT to a location with sufficient storage space for your checkpoints.

In the above example, PROJECT is cosmos_predict_v2p5, GROUP is video2world, NAME is 2b_groot_gr1_480.

Refer to the job config to understand how they are determined.

predict2_video2world_training_2b_groot_gr1_480 = dict(
    dict(
        ...
        job=dict(
            project="cosmos_predict_v2p5",
            group="video2world",
            name="2b_groot_gr1_480",
        ),
        ...
    )
)

Inference with the Post-trained checkpoint#

Converting DCP Checkpoint to Consolidated PyTorch Format#

Since the checkpoints are saved in DCP format during training, you need to convert them to consolidated PyTorch format (.pt) for inference. Use the convert_distcp_to_pt.py script:

# Get path to the latest checkpoint
CHECKPOINTS_DIR=${IMAGINAIRE_OUTPUT_ROOT:-/tmp/imaginaire4-output}/cosmos_predict_v2p5/video2world/2b_groot_gr1_480/checkpoints
CHECKPOINT_ITER=$(cat $CHECKPOINTS_DIR/latest_checkpoint.txt)
CHECKPOINT_DIR=$CHECKPOINTS_DIR/$CHECKPOINT_ITER

# Convert DCP checkpoint to PyTorch format
python scripts/convert_distcp_to_pt.py $CHECKPOINT_DIR/model $CHECKPOINT_DIR

This conversion will create three files:

model.pt: Full checkpoint containing both regular and EMA weights
model_ema_fp32.pt: EMA weights only in float32 precision
model_ema_bf16.pt: EMA weights only in bfloat16 precision (recommended for inference)

Running Inference#

After converting the checkpoint, you can run inference with your post-trained model using a JSON configuration file that specifies the inference parameters (refer to assets/sample_gr00t_dreams_gr1/gr00t_image2world.json for an example). Note that we override the inference resolution in the JSON file to match the 480p training resolution.

torchrun --nproc_per_node=8 examples/inference.py \
  -i assets/sample_gr00t_dreams_gr1/gr00t_image2world.json \
  -o outputs/gr00t_gr1_sample \
  --checkpoint-path $CHECKPOINT_DIR/model_ema_bf16.pt \
  --experiment predict2_video2world_training_2b_groot_gr1_480

Generated videos will be saved to the output directory (e.g. outputs/gr00t_gr1/).