bridge.models.nemotron_omni.nemotron_omni_provider#
Module Contents#
Classes#
Abstract base provider for Nemotron VL model variants. |
|
Provider for Nemotron Omni (VL + sound) models. |
API#
- class bridge.models.nemotron_omni.nemotron_omni_provider.NemotronVLModelProvider#
Bases:
megatron.bridge.models.mamba.mamba_provider.MambaModelProvider,abc.ABCAbstract base provider for Nemotron VL model variants.
Provides common VL fields, RADIO ViT-H vision config building methods, and vision projection config building methods shared by dense and MoE variants. Concrete subclasses set LLM-specific defaults (hidden_size, hybrid pattern, etc.) and may override
provide()for variant-specific assembly.- mamba_num_groups: int#
8
- num_query_groups: int#
8
- make_vocab_size_divisible_by: int#
128
- activation_func: Callable#
None
- masked_softmax_fusion: bool#
True
- apply_query_key_layer_scaling: bool#
False
- persist_layer_norm: bool#
True
- first_last_layers_bf16: bool#
True
- is_hybrid_model: bool#
True
- moe_aux_loss_coeff: float#
0.0001
- moe_router_score_function: str#
‘sigmoid’
- moe_router_enable_expert_bias: bool#
True
- moe_router_load_balancing_type: str#
‘seq_aux_loss’
- moe_router_dtype: str#
‘fp32’
- moe_grouped_gemm: bool#
True
- moe_token_dispatcher_type: str#
‘alltoall’
- moe_permute_fusion: bool#
True
True
- scatter_embedding_sequence_parallel: bool#
False
- attention_softmax_in_fp32: bool#
True
- vision_model_type: str#
‘radio’
- language_model_type: str = <Multiline-String>#
- image_token_index: int#
0
- img_start_token_id: int#
0
- img_end_token_id: int#
0
- tokenizer_type: str = <Multiline-String>#
- dynamic_resolution: bool#
False
- use_vision_backbone_fp8_arch: bool#
True
- radio_force_eval_mode: bool#
True
- radio_force_cpe_eval_mode: bool#
True
- radio_interpolate_only_cpe: bool#
True
- radio_cpe_aspect_ratio_select: bool#
False
- radio_disable_cpe: bool#
False
20480
- vision_class_token_len: Optional[int]#
None
- freeze_language_model: bool#
False
- freeze_vision_model: bool#
False
- freeze_vision_projection: bool#
False
- _build_vision_config(language_cfg)#
Build RADIO ViT-H vision encoder config from a language config copy.
- _build_vision_projection_config(language_cfg)#
Build vision projection MLP config from a language config copy.
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,
- class bridge.models.nemotron_omni.nemotron_omni_provider.NemotronOmniModelProvider#
Bases:
bridge.models.nemotron_omni.nemotron_omni_provider.NemotronVLModelProviderProvider for Nemotron Omni (VL + sound) models.
Extends NemotronVLModelProvider with sound-specific fields. When has_sound is False, behaves identically to the VL provider (backward compatible).
- has_sound: bool#
False
- sound_model_type: str#
‘parakeet’
1024
4096
- sound_context_token_id: int#
0
- sound_config: Optional[dict]#
None
- freeze_sound_encoder: bool#
False
- freeze_sound_projection: bool#
False
- temporal_patch_dim: int#
1
- separate_video_embedder: bool#
False
- temporal_ckpt_compat: bool#
False
- _build_vision_config(language_cfg)#
Pin vision encoder to PP=1 (Omni training uses PP>1 on the LLM).
The dense Nemotron-VL recipe runs with PP=1 everywhere, so the base VL provider doesn’t pin this; for Omni we always co-locate the vision encoder with the first PP stage.
- _build_vision_projection_config(language_cfg)#
Build vision projection MLP config, overriding activation to ReLU.
The HF Nemotron-Omni model uses plain ReLU in its vision projection MLP (mlp1), not the squared_relu used by the language model. Also pin to PP=1 (see :meth:
_build_vision_config).
- _build_sound_projection_config(language_cfg)#
Build sound projection config (mirrors _build_vision_projection_config).
- _build_sound_encoder()#
Build BridgeSoundEncoder from sound_config dict.
- provide(pre_process=None, post_process=None, vp_stage=None)#
Assemble NemotronOmniModel wrapping a LLaVAModel with optional sound support.
Duplicates the VL provide() logic because LLaVAModel requires sound kwargs at construction time – they can’t be added after. This is intentional to maintain zero changes to nemotron_vl/.