Physical AI Workflow#
This section summarizes the end-to-end Physical AI workflow at the reference-architecture level. The runnable OSMO workflows, YAML definitions, setup commands, and implementation details are maintained in the public OSMO cookbook: NVIDIA/OSMO.
Overall Workflow#
The public cookbook demonstrates a six-step pipeline:
MimicGen HDF5 generation from teleoperation data.
HDF5 to MP4 conversion for camera observations.
Cosmos Transfer 2.5 visual augmentation.
MP4 to HDF5 conversion to merge augmented observations.
LeRobot format conversion.
GR00T-N1.5 fine-tuning.
Source of Truth#
Use the public cookbook as the source of truth for commands, YAML, image names, parameter defaults, and code snippets:
Setup#
At a high level, users need:
An authenticated OSMO CLI.
Access to an OSMO cluster with suitable GPU resources.
Required credentials for NGC and Hugging Face where applicable.
A user-provided input dataset compatible with the workflow.
Execution#
Follow the public cookbook README to download the cookbook files, configure dataset and credential values, and submit the workflows sequentially. Do not copy the YAML blocks into this reference architecture; keep runnable implementation updates in the public cookbook to avoid stale instructions.
Pipeline Summary#
Step 1:
01_mimic_generation.yamlgenerates synthetic demonstrations from teleoperation data.Step 2:
02_hdf5_to_mp4.yamlextracts camera observations to MP4.Step 3:
03_cosmos_augmentation.yamlapplies Cosmos Transfer 2.5 visual augmentation.Step 4:
04_mp4_to_hdf5.yamlmerges augmented videos back to HDF5.Step 5:
05_lerobot_conversion.yamlconverts the dataset to LeRobot format.Step 6:
06_groot_finetune.yamlfine-tunes the GR00T-N1.5 model.
Monitoring and Data Locations#
Use the OSMO CLI and the public cookbook guidance to monitor workflow status and logs. Dataset names and storage locations in this reference architecture are examples only; end users must replace them with their own storage locations and access controls.