nemo_automodel.components.models.minimax_m3_vl.mtp
nemo_automodel.components.models.minimax_m3_vl.mtp
MiniMax M3 multi-token prediction (MTP), DeepSeek-V3 style.
The checkpoint carries a single MTP module under model.mtp.layers.0:
enorm/hnorm (Gemma RMSNorm of the next-token embedding and the previous
hidden state), eh_proj (Linear 2*hidden -> hidden over their
concatenation), a full MoE+sparse decoder transformer_layer, and
final_layernorm. There is no separate output projection — the prediction
head is the shared main lm_head.
sglang skips MTP at load (inference-only); the reference is the DeepSeek-V3 MTP algorithm.
Module Contents
Classes
API
Bases: Module
Stack of MTP depths (M3 ships a single depth).
Return per-depth next-token-+k logits using the shared lm_head.
Bases: Module
One MTP depth: eh_proj(cat[enorm(emb), hnorm(h)]) -> Block -> final_layernorm.
transformer_layer is a full M3 decoder block; it is constructed at the
last decoder index (always MoE + sparse-attention in M3) so the shared
:class:~...layers.Block builds the routed MoE and the sparse-attention
indexer automatically.