Stage 0: Supervised Fine-Tuning (SFT)#
Omni starts from the GA nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 checkpoint (30B-A3B hybrid Mamba-transformer MoE) and fine-tunes it with the Valor32k multimodal recipe family using Megatron-Bridge. SFT teaches the perception-sub-agent surface — instruction-following over multimodal inputs — that downstream RL alignment and agentic systems consume. The released configs target the open-data subset; see architecture.md §Progressive context scaling for how the open configs relate to the upstream 16K → 49K → 262K training schedule.
Container-first stage: Omni does not ship with a pre-baked image. This stage owns the
Dockerfilethat thenemotron omni3 build sftdispatcher turns intoomni3-sft.sqsh, which all later SFT/eval commands reuse via the per-clusterbuild_cache_dir.
Defaults — the shipped
default.yamluses CORD-v2 from HuggingFace via Megatron-Bridge’svlm-hfloader, sonemotron omni3 sft --run <profile>works out of the box with no internal data access.-c valor32kswitches to the full audio-visual-language Energon flow but requires the internal Valor32k-AVQA dataset (see Config Variants).
Current limitations (also summarized in the family README):
Open-dataset default trains projector only. CORD-v2 plus
freeze_language_model: truefits on a single 8-GPU node (per QA guide §5.2.2). For full-model SFT, switch to-c image_text_peft(LoRA on CORD-v2) or prepare your own Energon dataset and pointdataset.pathat it.
nemotron omni3 data prep sftwith-c valor32kvalidates a prepared Energon dataset; the raw-shard builder is internal-only. With the default (HF) flow the command is a no-op manifest writer — the training container pulls from the Hub on demand.The
omni3-sftDockerfile clonesNVIDIA-NeMo/Megatron-Bridge @ nemotron_3_omni(withNVIDIA/Megatron-LM @ nemotron_3_omnias a recursive submodule fetch). These are the active release branches for Nemotron 3 Omni; bump to a versioned tag (ormain) once these changes merge upstream.
Stage Overview#
The stage directory is src/nemotron/recipes/omni3/stage0_sft/ and contains:
File |
Purpose |
|---|---|
|
Builds the Megatron-Bridge |
| data_prep.py | Validates or stages a prepared Valor32k Energon dataset |
| train.py | Runs scripts/training/run_recipe.py with the selected recipe |
| config/*.yaml | Full SFT, PEFT, audio-text, and tiny variants |
Container Build#
Build the SFT container on-cluster:
uv run nemotron omni3 build sft --run YOUR-CLUSTER
The canonical archive path is:
${build_cache_dir}/containers/omni3-sft.sqsh
build_cache_dir is set per profile in env.toml and is mounted into
the build container at /nemotron-cache. The dispatcher also pulls
your nvcr.io credentials out of ~/.config/enroot/.credentials and
exposes them to the build container as a docker-format auth.json,
so FROM nvcr.io/nvidian/nemo:<tag> resolves without a separate
podman login. See How container builds authenticate
for the full mechanism + how to extend the registry allowlist.
For local iteration, you can build the same stage directly from the Dockerfile:
cd src/nemotron/recipes/omni3/stage0_sft
docker build -t nemotron/omni3-sft:latest -f Dockerfile .
# or
podman build -t nemotron/omni3-sft:latest -f Dockerfile .
Valor32k and SDG Data Flow#
%%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333'}}}%%
flowchart LR
sdg["data/sdg/long-document"] --> valor["Prepared Valor32k / Energon data"]
valor --> prep["omni3 data prep sft"]
prep --> manifest["Manifest + staged metadata"]
ga["GA HF checkpoint"] --> import["omni3 model import pretrain"]
manifest --> train["omni3 sft"]
import --> train
train --> out["Omni SFT checkpoint"]
style sdg fill:#e3f2fd,stroke:#2196f3
style prep fill:#f3e5f5,stroke:#9c27b0
style train fill:#f3e5f5,stroke:#9c27b0
style out fill:#e8f5e9,stroke:#4caf50
The public CLI does not build Valor32k shards from scratch yet. Instead, data_prep.py gives the recipe a concrete staging hook by:
validating
dataset_pathoptionally running a site-local
builder_commandwriting
manifest.jsonundermetadata_diroptionally refreshing a convenience symlink with
link_path
Run it with:
uv run nemotron omni3 data prep sft --run YOUR-CLUSTER
Quick Start#
// 1. Build the container
$ uv run nemotron omni3 build sft --run YOUR-CLUSTER
// 2. Stage or validate the Valor32k Energon dataset
$ uv run nemotron omni3 data prep sft --run YOUR-CLUSTER
// 3. Convert the GA Hugging Face checkpoint to Megatron format
$ uv run nemotron omni3 model import pretrain --run YOUR-CLUSTER \
--hf-model nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 \
--megatron-path /checkpoints/nemotron_omni
// 4. Launch SFT
$ uv run nemotron omni3 sft --run YOUR-CLUSTER
Config Variants#
The stage ports the QA-guide variants into explicit YAML files:
Config |
Purpose |
|---|---|
|
Full Valor32k SFT |
|
Image-text projector SFT |
|
Image-text LoRA / PEFT |
|
Audio-text SFT |
|
Valor32k LoRA / PEFT |
|
Small smoke-test config |
Select a variant with -c:
uv run nemotron omni3 sft -c image_text_peft --run YOUR-CLUSTER
LoRA and PEFT Variants#
The two PEFT-oriented configs are:
image_text_peft.yamlpeft_valor32k.yaml
They keep the same stage-local execution path as full SFT but swap in LoRA-oriented training settings. After training, the family also exposes the related model lifecycle commands:
uv run nemotron omni3 model lora-merge --run YOUR-CLUSTER ...uv run nemotron omni3 model adapter-export --run YOUR-CLUSTER ...uv run nemotron omni3 model export pretrain --run YOUR-CLUSTER ...
Training Configuration Notes#
The default Omni SFT config currently uses:
Setting |
Value |
|---|---|
|
8 |
|
4 |
|
4 |
|
4096 |
|
128 |
|
1 |
The model checkpoint and staged dataset are passed through the artifact system or environment overrides:
checkpoint:
pretrained_checkpoint: ${oc.env:OMNI3_MEGATRON_CHECKPOINT,/checkpoints/nemotron_omni}
dataset:
path: ${oc.env:OMNI3_VALOR32K_ENERGON_PATH,/datasets/valor32k/energon}
Infrastructure#
This stage uses:
Component |
Role |
Documentation |
|---|---|---|
Distributed TP/EP training primitives |
||
Recipe execution and checkpoint conversion |
Next Steps#
After SFT completes, proceed to Stage 1: RL.
Upstream#
This stage is the cookbook view of the upstream Megatron-Bridge omni
SFT flow. For the canonical recipe (hyperparameters, config tables,
model-level training notes), see the Megatron-Bridge nemotron_3_omni
README.
The Dockerfile in this stage pins NVIDIA-NeMo/Megatron-Bridge @ nemotron_3_omni (and NVIDIA/Megatron-LM @ nemotron_3_omni as a recursive submodule fetch); bump those branches once they merge to a versioned tag.