`bridge.models.mimo_v2_flash.mimo_v2_flash_provider`#

MiMo-V2-Flash Model Provider with dual-base RoPE.

The hybrid attention pattern (full vs SWA per layer) and per-layer KV head switching are handled by storing config on the provider.

Module Contents#

Classes#

MiMoV2FlashModelProvider

Configuration and provider for MiMo-V2-Flash models.

API#

class bridge.models.mimo_v2_flash.mimo_v2_flash_provider.MiMoV2FlashModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Configuration and provider for MiMo-V2-Flash models.

Extends GPTModelProvider with MiMo-V2-Flash-specific fields that need to persist in run_config.yaml and be accessible to custom modules.

The hybrid attention pattern, per-layer KV heads, and dual RoPE bases are stored here. The provide() override replaces the standard RoPE with a dual-base version (same pattern as Gemma3ModelProvider).

transformer_layer_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[bridge.models.mimo_v2_flash.mimo_v2_flash_provider.MiMoV2FlashModelProvider], megatron.core.transformer.ModuleSpec]]#: ‘field(…)’

hybrid_attention_pattern: Optional[List[int]]#: None

window_size: Union[int, tuple, None]#: 128

rotary_base: Union[int, float, tuple]#: (10000, 5000000)

full_attn_num_query_groups: int#: 4

swa_num_query_groups: int#: 8

v_head_dim: int#: 128

attention_value_scale: Optional[float]#: None

normalization: str#: ‘RMSNorm’

gated_linear_unit: bool#: True

add_bias_linear: bool#: False

position_embedding_type: str#: ‘rope’

share_embeddings_and_output_weights: bool#: False

provide( pre_process=None, post_process=None, vp_stage=None, ) → megatron.core.models.gpt.GPTModel#: Configure and instantiate a Megatron Core GPT model for MiMo-V2-Flash.

bridge.models.mimo_v2_flash.mimo_v2_flash_provider#

Module Contents#

Classes#

API#

`bridge.models.mimo_v2_flash.mimo_v2_flash_provider`#