MiniMax-M2#
MiniMax-M2 is a large sparse MoE language model from MiniMaxAI. Megatron Bridge supports MiniMax-M2 and architecture-compatible M2.5/M2.7 checkpoints through the MiniMaxM2Bridge.
Supported Variants#
Variant |
Hugging Face ID |
Notes |
|---|---|---|
MiniMax-M2 |
|
456B total, 45.9B active |
MiniMax-M2.5 |
|
Same |
MiniMax-M2.7 |
|
Same |
Architecture Notes#
Sparse MoE with 256 experts and top-8 routing.
Sigmoid router with expert-bias correction.
FP8 block-wise checkpoint weights are dequantized during import.
Uses full-dimension QK RMSNorm; the bridge provides custom Q/K norm mappings for tensor-parallel shards.
MTP modules are not currently mapped.
Examples#
For Slurm conversion, inference, hardware requirements, and validated parallelism settings, see the MiniMax-M2 examples README.