`bridge.models.nemotron_omni.nemotron_omni_provider`#

Module Contents#

Classes#

`NemotronVLModelProvider`	Abstract base provider for Nemotron VL model variants.
`NemotronOmniModelProvider`	Provider for Nemotron Omni (VL + sound) models.

API#

class bridge.models.nemotron_omni.nemotron_omni_provider.NemotronVLModelProvider#

Bases: megatron.bridge.models.hybrid.hybrid_provider.HybridModelProvider, abc.ABC

Abstract base provider for Nemotron VL model variants.

Provides common VL fields, RADIO ViT-H vision config building methods, and vision projection config building methods shared by dense and MoE variants. Concrete subclasses set LLM-specific defaults (hidden_size, hybrid pattern, etc.) and may override provide() for variant-specific assembly.

mamba_num_groups: int#: 8

num_query_groups: int#: 8

make_vocab_size_divisible_by: int#: 128

activation_func: Callable#: None

masked_softmax_fusion: bool#: True

apply_query_key_layer_scaling: bool#: False

persist_layer_norm: bool#: True

first_last_layers_bf16: bool#: True

is_hybrid_model: bool#: True

moe_aux_loss_coeff: float#: 0.0001

moe_router_score_function: str#: ‘sigmoid’

moe_router_enable_expert_bias: bool#: True

moe_router_load_balancing_type: str#: ‘seq_aux_loss’

moe_router_dtype: str#: ‘fp32’

moe_grouped_gemm: bool#: True

moe_token_dispatcher_type: str#: ‘alltoall’

moe_permute_fusion: bool#: True

moe_shared_expert_overlap: bool#: True

scatter_embedding_sequence_parallel: bool#: False

attention_softmax_in_fp32: bool#: True

vision_model_type: str#: ‘radio’

language_model_type: str = <Multiline-String>#

image_token_index: int#: 0

img_start_token_id: int#: 0

img_end_token_id: int#: 0

tokenizer_type: str = <Multiline-String>#

dynamic_resolution: bool#: False

use_vision_backbone_fp8_arch: bool#: True

radio_force_eval_mode: bool#: True

radio_force_cpe_eval_mode: bool#: True

radio_interpolate_only_cpe: bool#: True

radio_cpe_aspect_ratio_select: bool#: False

radio_disable_cpe: bool#: False

vision_proj_ffn_hidden_size: int#: 20480

vision_class_token_len: Optional[int]#: None

freeze_language_model: bool#: False

freeze_vision_model: bool#: False

freeze_vision_projection: bool#: False

_build_vision_config(language_cfg)#: Build RADIO ViT-H vision encoder config from a language config copy.

_build_vision_projection_config(language_cfg)#: Build vision projection MLP config from a language config copy.

provide_language_model( pre_process=None, post_process=None, vp_stage=None, )#

class bridge.models.nemotron_omni.nemotron_omni_provider.NemotronOmniModelProvider#

Bases: bridge.models.nemotron_omni.nemotron_omni_provider.NemotronVLModelProvider

Provider for Nemotron Omni (VL + sound) models.

Extends NemotronVLModelProvider with sound-specific fields. When has_sound is False, behaves identically to the VL provider (backward compatible).

has_sound: bool#: False

sound_model_type: str#: ‘parakeet’

sound_hidden_size: int#: 1024

sound_projection_hidden_size: int#: 4096

sound_context_token_id: int#: 0

sound_config: Optional[dict]#: None

freeze_sound_encoder: bool#: False

freeze_sound_projection: bool#: False

temporal_patch_dim: int#: 1

separate_video_embedder: bool#: False

temporal_ckpt_compat: bool#: False

_build_vision_config(language_cfg)#

Pin vision encoder to PP=1 (Omni training uses PP>1 on the LLM).

The dense Nemotron-VL recipe runs with PP=1 everywhere, so the base VL provider doesn’t pin this; for Omni we always co-locate the vision encoder with the first PP stage.

_build_vision_projection_config(language_cfg)#

Build vision projection MLP config, overriding activation to ReLU.

The HF Nemotron-Omni model uses plain ReLU in its vision projection MLP (mlp1), not the squared_relu used by the language model. Also pin to PP=1 (see :meth:_build_vision_config).

_build_sound_projection_config(language_cfg)#: Build sound projection config (mirrors _build_vision_projection_config).

_build_sound_encoder()#: Build BridgeSoundEncoder from sound_config dict.

provide(pre_process=None, post_process=None, vp_stage=None)#

Assemble NemotronOmniModel wrapping a LLaVAModel with optional sound support.

Duplicates the VL provide() logic because LLaVAModel requires sound kwargs at construction time – they can’t be added after. This is intentional to maintain zero changes to nemotron_vl/.

bridge.models.nemotron_omni.nemotron_omni_provider#

Module Contents#

Classes#

API#

`bridge.models.nemotron_omni.nemotron_omni_provider`#