nemo_automodel.components.models.mistral4.configuration

View as Markdown

Mistral4 configuration for AutoConfig registration.

Needed so that AutoConfig.from_pretrained can resolve model_type: "mistral4" in the text_config of Mistral3-wrapped checkpoints, even when the HF transformers version does not ship a native Mistral4Config (added only in the custom fork).

Module Contents

Classes

NameDescription
Mistral4Config-

API

class nemo_automodel.components.models.mistral4.configuration.Mistral4Config(
vocab_size = 131072,
hidden_size = 4096,
intermediate_size = 12288,
moe_intermediate_size = 2048,
num_hidden_layers = 36,
num_attention_heads = 32,
num_key_value_heads = 32,
n_shared_experts = 1,
n_routed_experts = 128,
routed_scaling_factor = 1.0,
kv_lora_rank = 256,
q_lora_rank = 1024,
qk_rope_head_dim = 64,
v_head_dim = 128,
qk_nope_head_dim = 64,
n_group = 8,
topk_group = 1,
num_experts_per_tok = 4,
first_k_dense_replace = 0,
norm_topk_prob = True,
hidden_act = 'silu',
max_position_embeddings = 1048576,
initializer_range = 0.02,
rms_norm_eps = 1e-06,
use_cache = True,
pad_token_id = 11,
bos_token_id = 1,
eos_token_id = 2,
pretraining_tp = 1,
tie_word_embeddings = False,
rope_parameters = None,
rope_interleave = True,
attention_bias = False,
attention_dropout = 0.0,
mlp_bias = False,
kwargs = {}
)

Bases: PretrainedConfig

model_type
= 'mistral4'
qk_head_dim
= qk_nope_head_dim + qk_rope_head_dim