Physical AI Workflow#

This section summarizes the end-to-end Physical AI workflow at the reference-architecture level. The runnable OSMO workflows, YAML definitions, setup commands, and implementation details are maintained in the public OSMO cookbook: NVIDIA/OSMO.

Overall Workflow#

The public cookbook demonstrates a six-step pipeline:

MimicGen HDF5 generation from teleoperation data.
HDF5 to MP4 conversion for camera observations.
Cosmos Transfer 2.5 visual augmentation.
MP4 to HDF5 conversion to merge augmented observations.
LeRobot format conversion.
GR00T-N1.5 fine-tuning.

Source of Truth#

Use the public cookbook as the source of truth for commands, YAML, image names, parameter defaults, and code snippets:

Setup#

At a high level, users need:

An authenticated OSMO CLI.
Access to an OSMO cluster with suitable GPU resources.
Required credentials for NGC and Hugging Face where applicable.
A user-provided input dataset compatible with the workflow.

Execution#

Follow the public cookbook README to download the cookbook files, configure dataset and credential values, and submit the workflows sequentially. Do not copy the YAML blocks into this reference architecture; keep runnable implementation updates in the public cookbook to avoid stale instructions.

Pipeline Summary#

Step 1: 01_mimic_generation.yaml generates synthetic demonstrations from teleoperation data.
Step 2: 02_hdf5_to_mp4.yaml extracts camera observations to MP4.
Step 3: 03_cosmos_augmentation.yaml applies Cosmos Transfer 2.5 visual augmentation.
Step 4: 04_mp4_to_hdf5.yaml merges augmented videos back to HDF5.
Step 5: 05_lerobot_conversion.yaml converts the dataset to LeRobot format.
Step 6: 06_groot_finetune.yaml fine-tunes the GR00T-N1.5 model.

Monitoring and Data Locations#

Use the OSMO CLI and the public cookbook guidance to monitor workflow status and logs. Dataset names and storage locations in this reference architecture are examples only; end users must replace them with their own storage locations and access controls.