nemo_automodel.components.models.mistral4.configuration#

Mistral4 configuration for AutoConfig registration.

Needed so that AutoConfig.from_pretrained can resolve model_type: "mistral4" in the text_config of Mistral3-wrapped checkpoints, even when the HF transformers version does not ship a native Mistral4Config (added only in the custom fork).

Module Contents#

Classes#

API#

class nemo_automodel.components.models.mistral4.configuration.Mistral4Config(
vocab_size=131072,
hidden_size=4096,
intermediate_size=12288,
moe_intermediate_size=2048,
num_hidden_layers=36,
num_attention_heads=32,
num_key_value_heads=32,
n_shared_experts=1,
n_routed_experts=128,
routed_scaling_factor=1.0,
kv_lora_rank=256,
q_lora_rank=1024,
qk_rope_head_dim=64,
v_head_dim=128,
qk_nope_head_dim=64,
n_group=8,
topk_group=1,
num_experts_per_tok=4,
first_k_dense_replace=0,
norm_topk_prob=True,
hidden_act='silu',
max_position_embeddings=1048576,
initializer_range=0.02,
rms_norm_eps=1e-06,
use_cache=True,
pad_token_id=11,
bos_token_id=1,
eos_token_id=2,
pretraining_tp=1,
tie_word_embeddings=False,
rope_parameters=None,
rope_interleave=True,
attention_bias=False,
attention_dropout=0.0,
mlp_bias=False,
**kwargs,
)#

Bases: transformers.PretrainedConfig

Initialization

model_type#

‘mistral4’