bridge.models.nemotron_vl.nemotron_vl_provider#
Module Contents#
Classes#
Configuration provider for Nemotron-VL models. |
API#
- class bridge.models.nemotron_vl.nemotron_vl_provider.NemotronNano12Bv2VLModelProvider#
Bases:
megatron.bridge.models.mamba.mamba_provider.MambaModelProviderConfiguration provider for Nemotron-VL models.
Inlines NemotronH + NemotronNano12Bv2 defaults directly.
- mamba_num_groups: int#
8
- mamba_head_dim: int#
80
- num_query_groups: int#
8
- make_vocab_size_divisible_by: int#
128
- activation_func: Callable#
None
- masked_softmax_fusion: bool#
True
- apply_query_key_layer_scaling: bool#
False
- persist_layer_norm: bool#
True
- first_last_layers_bf16: bool#
True
- is_hybrid_model: bool#
True
- moe_aux_loss_coeff: float#
0.0001
- moe_router_score_function: str#
‘sigmoid’
- moe_router_enable_expert_bias: bool#
True
- moe_router_load_balancing_type: str#
‘seq_aux_loss’
- moe_router_dtype: str#
‘fp32’
- moe_grouped_gemm: bool#
True
- moe_token_dispatcher_type: str#
‘alltoall’
- moe_permute_fusion: bool#
True
True
- hybrid_layer_pattern: str#
‘M-M-M-M*-M-M-M-M*-M-M-M-M*-M-M-M-M*-M-M-M-M*-M-M-M-M*-M-M-M-M-’
5120
- mamba_num_heads: int#
128
- kv_channels: int#
128
- mamba_state_dim: int#
128
20480
- num_attention_heads: int#
40
- seq_length: int#
131072
- scatter_embedding_sequence_parallel: bool#
False
- attention_softmax_in_fp32: bool#
True
- vision_model_type: str#
‘radio’
- language_model_type: str#
‘nemotron5-hybrid-12b’
- freeze_language_model: bool#
False
- freeze_vision_model: bool#
False
- freeze_vision_projection: bool#
False
- provide(pre_process=None, post_process=None, vp_stage=None)#
Assemble a full :class:
~megatron.core.models.multimodal.llava_model.LLaVAModel.
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,