Stage 0: Supervised Fine-Tuning (SFT)#

Omni starts from the GA nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 checkpoint (30B-A3B hybrid Mamba-transformer MoE) and fine-tunes it with the Valor32k multimodal recipe family using Megatron-Bridge. SFT teaches the perception-sub-agent surface — instruction-following over multimodal inputs — that downstream RL alignment and agentic systems consume. The released configs target the open-data subset; see architecture.md §Progressive context scaling for how the open configs relate to the upstream 16K → 49K → 262K training schedule.

Container-first stage: Omni does not ship with a pre-baked image. This stage owns the Dockerfile that the nemotron omni3 build sft dispatcher turns into omni3-sft.sqsh, which all later SFT/eval commands reuse via the per-cluster build_cache_dir.

Defaults — the shipped default.yaml uses CORD-v2 from HuggingFace via Megatron-Bridge’s vlm-hf loader, so nemotron omni3 sft --run <profile> works out of the box with no internal data access. -c valor32k switches to the full audio-visual-language Energon flow but requires the internal Valor32k-AVQA dataset (see Config Variants).

Current limitations (also summarized in the family README):

  • Open-dataset default trains projector only. CORD-v2 plus freeze_language_model: true fits on a single 8-GPU node (per QA guide §5.2.2). For full-model SFT, switch to -c image_text_peft (LoRA on CORD-v2) or prepare your own Energon dataset and point dataset.path at it.

  • nemotron omni3 data prep sft with -c valor32k validates a prepared Energon dataset; the raw-shard builder is internal-only. With the default (HF) flow the command is a no-op manifest writer — the training container pulls from the Hub on demand.

  • The omni3-sft Dockerfile clones NVIDIA-NeMo/Megatron-Bridge @ nemotron_3_omni (with NVIDIA/Megatron-LM @ nemotron_3_omni as a recursive submodule fetch). These are the active release branches for Nemotron 3 Omni; bump to a versioned tag (or main) once these changes merge upstream.


Stage Overview#

The stage directory is src/nemotron/recipes/omni3/stage0_sft/ and contains:

File

Purpose

Dockerfile

Builds the Megatron-Bridge nemotron_3_omni environment

| data_prep.py | Validates or stages a prepared Valor32k Energon dataset | | train.py | Runs scripts/training/run_recipe.py with the selected recipe | | config/*.yaml | Full SFT, PEFT, audio-text, and tiny variants |

Container Build#

Build the SFT container on-cluster:

uv run nemotron omni3 build sft --run YOUR-CLUSTER

The canonical archive path is:

${build_cache_dir}/containers/omni3-sft.sqsh

build_cache_dir is set per profile in env.toml and is mounted into the build container at /nemotron-cache. The dispatcher also pulls your nvcr.io credentials out of ~/.config/enroot/.credentials and exposes them to the build container as a docker-format auth.json, so FROM nvcr.io/nvidian/nemo:<tag> resolves without a separate podman login. See How container builds authenticate for the full mechanism + how to extend the registry allowlist.

For local iteration, you can build the same stage directly from the Dockerfile:

cd src/nemotron/recipes/omni3/stage0_sft
docker build -t nemotron/omni3-sft:latest -f Dockerfile .
# or
podman build -t nemotron/omni3-sft:latest -f Dockerfile .

Valor32k and SDG Data Flow#

        %%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333'}}}%%
flowchart LR
    sdg["data/sdg/long-document"] --> valor["Prepared Valor32k / Energon data"]
    valor --> prep["omni3 data prep sft"]
    prep --> manifest["Manifest + staged metadata"]
    ga["GA HF checkpoint"] --> import["omni3 model import pretrain"]
    manifest --> train["omni3 sft"]
    import --> train
    train --> out["Omni SFT checkpoint"]

    style sdg fill:#e3f2fd,stroke:#2196f3
    style prep fill:#f3e5f5,stroke:#9c27b0
    style train fill:#f3e5f5,stroke:#9c27b0
    style out fill:#e8f5e9,stroke:#4caf50
    

The public CLI does not build Valor32k shards from scratch yet. Instead, data_prep.py gives the recipe a concrete staging hook by:

  • validating dataset_path

  • optionally running a site-local builder_command

  • writing manifest.json under metadata_dir

  • optionally refreshing a convenience symlink with link_path

Run it with:

uv run nemotron omni3 data prep sft --run YOUR-CLUSTER

Quick Start#

// 1. Build the container
$ uv run nemotron omni3 build sft --run YOUR-CLUSTER

// 2. Stage or validate the Valor32k Energon dataset
$ uv run nemotron omni3 data prep sft --run YOUR-CLUSTER

// 3. Convert the GA Hugging Face checkpoint to Megatron format
$ uv run nemotron omni3 model import pretrain --run YOUR-CLUSTER \
    --hf-model nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 \
    --megatron-path /checkpoints/nemotron_omni

// 4. Launch SFT
$ uv run nemotron omni3 sft --run YOUR-CLUSTER

Config Variants#

The stage ports the QA-guide variants into explicit YAML files:

Config

Purpose

default.yaml

Full Valor32k SFT

image_text_sft.yaml

Image-text projector SFT

image_text_peft.yaml

Image-text LoRA / PEFT

audio_text.yaml

Audio-text SFT

peft_valor32k.yaml

Valor32k LoRA / PEFT

tiny.yaml

Small smoke-test config

Select a variant with -c:

uv run nemotron omni3 sft -c image_text_peft --run YOUR-CLUSTER

LoRA and PEFT Variants#

The two PEFT-oriented configs are:

  • image_text_peft.yaml

  • peft_valor32k.yaml

They keep the same stage-local execution path as full SFT but swap in LoRA-oriented training settings. After training, the family also exposes the related model lifecycle commands:

  • uv run nemotron omni3 model lora-merge --run YOUR-CLUSTER ...

  • uv run nemotron omni3 model adapter-export --run YOUR-CLUSTER ...

  • uv run nemotron omni3 model export pretrain --run YOUR-CLUSTER ...

Training Configuration Notes#

The default Omni SFT config currently uses:

Setting

Value

nproc_per_node

8

tensor_model_parallel_size

4

expert_model_parallel_size

4

seq_length

4096

global_batch_size

128

micro_batch_size

1

The model checkpoint and staged dataset are passed through the artifact system or environment overrides:

checkpoint:
  pretrained_checkpoint: ${oc.env:OMNI3_MEGATRON_CHECKPOINT,/checkpoints/nemotron_omni}

dataset:
  path: ${oc.env:OMNI3_VALOR32K_ENERGON_PATH,/datasets/valor32k/energon}

Infrastructure#

This stage uses:

Component

Role

Documentation

Megatron-Core

Distributed TP/EP training primitives

GitHub

Megatron-Bridge

Recipe execution and checkpoint conversion

Docs

Next Steps#

After SFT completes, proceed to Stage 1: RL.

Upstream#

This stage is the cookbook view of the upstream Megatron-Bridge omni SFT flow. For the canonical recipe (hyperparameters, config tables, model-level training notes), see the Megatron-Bridge nemotron_3_omni README. The Dockerfile in this stage pins NVIDIA-NeMo/Megatron-Bridge @ nemotron_3_omni (and NVIDIA/Megatron-LM @ nemotron_3_omni as a recursive submodule fetch); bump those branches once they merge to a versioned tag.

Reference#