Physical AI Workflow#

This section summarizes the end-to-end Physical AI workflow at the reference-architecture level. The runnable OSMO workflows, YAML definitions, setup commands, and implementation details are maintained in the public OSMO cookbook: NVIDIA/OSMO.

Overall Workflow#

The public cookbook demonstrates a six-step pipeline:

  • MimicGen HDF5 generation from teleoperation data.

  • HDF5 to MP4 conversion for camera observations.

  • Cosmos Transfer 2.5 visual augmentation.

  • MP4 to HDF5 conversion to merge augmented observations.

  • LeRobot format conversion.

  • GR00T-N1.5 fine-tuning.

Source of Truth#

Use the public cookbook as the source of truth for commands, YAML, image names, parameter defaults, and code snippets:

Setup#

At a high level, users need:

  • An authenticated OSMO CLI.

  • Access to an OSMO cluster with suitable GPU resources.

  • Required credentials for NGC and Hugging Face where applicable.

  • A user-provided input dataset compatible with the workflow.

Execution#

Follow the public cookbook README to download the cookbook files, configure dataset and credential values, and submit the workflows sequentially. Do not copy the YAML blocks into this reference architecture; keep runnable implementation updates in the public cookbook to avoid stale instructions.

Pipeline Summary#

  • Step 1: 01_mimic_generation.yaml generates synthetic demonstrations from teleoperation data.

  • Step 2: 02_hdf5_to_mp4.yaml extracts camera observations to MP4.

  • Step 3: 03_cosmos_augmentation.yaml applies Cosmos Transfer 2.5 visual augmentation.

  • Step 4: 04_mp4_to_hdf5.yaml merges augmented videos back to HDF5.

  • Step 5: 05_lerobot_conversion.yaml converts the dataset to LeRobot format.

  • Step 6: 06_groot_finetune.yaml fine-tunes the GR00T-N1.5 model.

Monitoring and Data Locations#

Use the OSMO CLI and the public cookbook guidance to monitor workflow status and logs. Dataset names and storage locations in this reference architecture are examples only; end users must replace them with their own storage locations and access controls.

References#