Kimi K2#

Kimi K2 is a large sparse MoE language model from Moonshot AI. Megatron Bridge supports Kimi K2 through the KimiK2Bridge, which reuses the common MLA/MoE mapping pattern used by DeepSeek-style architectures.

Supported Variants#

Megatron Bridge supports Kimi K2 checkpoints with the KimiK2ForCausalLM architecture and kimi_k2 model type, including:

Kimi-K2-Instruct: https://huggingface.co/moonshotai/Kimi-K2-Instruct

Architecture Notes#

Multi-Latent Attention with QK layernorm.
Sparse MoE layers with grouped GEMM, all-to-all token dispatch, expert bias, and shared expert overlap.
RMSNorm, gated MLPs, RoPE, and unshared input/output embeddings.
Vocab size is padded to a multiple of 1280 for Megatron execution.

Recipes#

Kimi K2 recipes live in src/megatron/bridge/recipes/kimi/kimi_k2.py.