MiniMax-M2#

MiniMax-M2 is a large sparse MoE language model from MiniMaxAI. Megatron Bridge supports MiniMax-M2 and architecture-compatible M2.5/M2.7 checkpoints through the MiniMaxM2Bridge.

Supported Variants#

Variant	Hugging Face ID	Notes
MiniMax-M2	`MiniMaxAI/MiniMax-M2`	456B total, 45.9B active
MiniMax-M2.5	`MiniMaxAI/MiniMax-M2.5`	Same `MiniMaxM2ForCausalLM` architecture
MiniMax-M2.7	`MiniMaxAI/MiniMax-M2.7`	Same `MiniMaxM2ForCausalLM` architecture

Architecture Notes#

Sparse MoE with 256 experts and top-8 routing.
Sigmoid router with expert-bias correction.
FP8 block-wise checkpoint weights are dequantized during import.
Uses full-dimension QK RMSNorm; the bridge provides custom Q/K norm mappings for tensor-parallel shards.
MTP modules are not currently mapped.

Examples#

For Slurm conversion, inference, hardware requirements, and validated parallelism settings, see the MiniMax-M2 examples README.