nemo_automodel.components.models.mistral4.configuration#
Mistral4 configuration for AutoConfig registration.
Needed so that AutoConfig.from_pretrained can resolve model_type: "mistral4"
in the text_config of Mistral3-wrapped checkpoints, even when the HF transformers
version does not ship a native Mistral4Config (added only in the custom fork).
Module Contents#
Classes#
API#
- class nemo_automodel.components.models.mistral4.configuration.Mistral4Config(
- vocab_size=131072,
- hidden_size=4096,
- intermediate_size=12288,
- moe_intermediate_size=2048,
- num_hidden_layers=36,
- num_attention_heads=32,
- num_key_value_heads=32,
- n_shared_experts=1,
- n_routed_experts=128,
- routed_scaling_factor=1.0,
- kv_lora_rank=256,
- q_lora_rank=1024,
- qk_rope_head_dim=64,
- v_head_dim=128,
- qk_nope_head_dim=64,
- n_group=8,
- topk_group=1,
- num_experts_per_tok=4,
- first_k_dense_replace=0,
- norm_topk_prob=True,
- hidden_act='silu',
- max_position_embeddings=1048576,
- initializer_range=0.02,
- rms_norm_eps=1e-06,
- use_cache=True,
- pad_token_id=11,
- bos_token_id=1,
- eos_token_id=2,
- pretraining_tp=1,
- tie_word_embeddings=False,
- rope_parameters=None,
- rope_interleave=True,
- attention_bias=False,
- attention_dropout=0.0,
- mlp_bias=False,
- **kwargs,
Bases:
transformers.PretrainedConfigInitialization
- model_type#
‘mistral4’