MiniMax-M2#

MiniMax-M2 is a large sparse MoE language model from MiniMaxAI. Megatron Bridge supports MiniMax-M2 and architecture-compatible M2.5/M2.7 checkpoints through the MiniMaxM2Bridge.

Supported Variants#

Variant

Hugging Face ID

Notes

MiniMax-M2

MiniMaxAI/MiniMax-M2

456B total, 45.9B active

MiniMax-M2.5

MiniMaxAI/MiniMax-M2.5

Same MiniMaxM2ForCausalLM architecture

MiniMax-M2.7

MiniMaxAI/MiniMax-M2.7

Same MiniMaxM2ForCausalLM architecture

Architecture Notes#

  • Sparse MoE with 256 experts and top-8 routing.

  • Sigmoid router with expert-bias correction.

  • FP8 block-wise checkpoint weights are dequantized during import.

  • Uses full-dimension QK RMSNorm; the bridge provides custom Q/K norm mappings for tensor-parallel shards.

  • MTP modules are not currently mapped.

Examples#

For Slurm conversion, inference, hardware requirements, and validated parallelism settings, see the MiniMax-M2 examples README.