nemo_automodel.components.models.mimo_v2_flash.config#
Module Contents#
Classes#
Configuration for XiaomiMiMo/MiMo-V2-Flash. |
API#
- class nemo_automodel.components.models.mimo_v2_flash.config.MiMoV2FlashConfig(
- vocab_size: int = 152576,
- hidden_size: int = 4096,
- intermediate_size: int = 16384,
- moe_intermediate_size: int = 2048,
- num_hidden_layers: int = 48,
- num_attention_heads: int = 64,
- num_key_value_heads: int = 4,
- head_dim: int = 192,
- v_head_dim: int = 128,
- swa_num_attention_heads: int = 64,
- swa_num_key_value_heads: int = 8,
- swa_head_dim: int = 192,
- swa_v_head_dim: int = 128,
- hidden_act: str = 'silu',
- max_position_embeddings: int = 262144,
- initializer_range: float = 0.02,
- layernorm_epsilon: float = 1e-05,
- rms_norm_eps: float | None = None,
- use_cache: bool = True,
- tie_word_embeddings: bool = False,
- rope_theta: float = 5000000.0,
- swa_rope_theta: float = 10000.0,
- rope_scaling: dict | None = None,
- attention_bias: bool = False,
- attention_dropout: float = 0.0,
- attention_value_scale: float | None = 0.707,
- add_full_attention_sink_bias: bool = False,
- add_swa_attention_sink_bias: bool = True,
- hybrid_block_size: int | None = None,
- hybrid_layer_pattern: list[int] | None = None,
- partial_rotary_factor: float = 0.334,
- sliding_window: int | None = 128,
- sliding_window_size: int | None = 128,
- attention_chunk_size: int | None = 128,
- n_routed_experts: int | None = 256,
- n_shared_experts: int | None = None,
- num_experts_per_tok: int = 8,
- scoring_func: str = 'sigmoid',
- topk_method: str = 'noaux_tc',
- n_group: int = 1,
- topk_group: int = 1,
- norm_topk_prob: bool = True,
- routed_scaling_factor: float | None = 1.0,
- moe_layer_freq: list[int] | None = None,
- torch_dtype: str = 'bfloat16',
- **kwargs,
Bases:
transformers.PretrainedConfigConfiguration for XiaomiMiMo/MiMo-V2-Flash.
The Hugging Face remote config class currently leaves
model_typeempty. Automodel registers this local config with the hub’s JSONmodel_typeso configs can resolve without executing remote code.Initialization
- model_type#
‘mimo_v2_flash’
- keys_to_ignore_at_inference#
[‘past_key_values’]
- attribute_map#
None