bridge.recipes.nemotron_omni.nemotron_omni#

Nemotron Omni SFT/PEFT recipes (CORD v2 VL, Valor32k-AVQA audio-visual, temporal video).

All recipes use nemotron_omni_step (pass --step_func nemotron_omni_step).

Module Contents#

Functions#

nemotron_omni_cord_v2_sft_config

Return a VL SFT config for Nemotron Omni on CORD v2.

nemotron_omni_cord_v2_peft_config

Return a LoRA PEFT config for Nemotron Omni on CORD v2.

_nemotron_omni_base_config

Shared model/training config for all Nemotron Omni recipes.

nemotron_omni_valor32k_sft_config

Return an Energon SFT config with temporal video embedder enabled.

nemotron_omni_valor32k_peft_config

LoRA PEFT recipe on temporal-video Energon path (temporal_patch_dim=2).

Data#

API#

bridge.recipes.nemotron_omni.nemotron_omni._DEFAULT_HF_PATH#

‘nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16’

bridge.recipes.nemotron_omni.nemotron_omni.nemotron_omni_cord_v2_sft_config(
hf_path: str = _DEFAULT_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

Return a VL SFT config for Nemotron Omni on CORD v2.

Vision-language finetuning on the CORD v2 receipt parsing dataset. Default configuration: 1 node, 8 GPUs (TP=4). Uses nemotron_omni_step (pass –step_func nemotron_omni_step).

Parameters:

hf_path – HuggingFace model ID or local path to the Nemotron Omni model.

bridge.recipes.nemotron_omni.nemotron_omni.nemotron_omni_cord_v2_peft_config(
hf_path: str = _DEFAULT_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

Return a LoRA PEFT config for Nemotron Omni on CORD v2.

LoRA adapters are applied to language-model attention + Mamba projections. Vision encoder/projection and sound encoder/projection are frozen. Default configuration: 1 node, 8 GPUs (TP=4). Uses nemotron_omni_step (pass –step_func nemotron_omni_step).

Parameters:

hf_path – HuggingFace model ID or local path to the Nemotron Omni model.

bridge.recipes.nemotron_omni.nemotron_omni._nemotron_omni_base_config(
hf_path: str = _DEFAULT_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

Shared model/training config for all Nemotron Omni recipes.

bridge.recipes.nemotron_omni.nemotron_omni.nemotron_omni_valor32k_sft_config(
hf_path: str = _DEFAULT_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

Return an Energon SFT config with temporal video embedder enabled.

Uses RADIO’s separate_video_embedder to fuse temporal frame pairs (2 consecutive frames → 1 vision embedding) instead of discarding every other frame. Requires dynamic_resolution=True. The shard path must be set via CLI override: dataset.path=<path>.

Uses nemotron_omni_step (pass --step_func nemotron_omni_step).

Parameters:

hf_path – HuggingFace model ID or local path to the Nemotron Omni model.

bridge.recipes.nemotron_omni.nemotron_omni.nemotron_omni_valor32k_peft_config(
hf_path: str = _DEFAULT_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

LoRA PEFT recipe on temporal-video Energon path (temporal_patch_dim=2).