Kimi K2#
Kimi K2 is a large sparse MoE language model from Moonshot AI. Megatron Bridge supports Kimi K2 through the KimiK2Bridge, which reuses the common MLA/MoE mapping pattern used by DeepSeek-style architectures.
Supported Variants#
Megatron Bridge supports Kimi K2 checkpoints with the KimiK2ForCausalLM architecture and kimi_k2 model type, including:
Kimi-K2-Instruct: https://huggingface.co/moonshotai/Kimi-K2-Instruct
Architecture Notes#
Multi-Latent Attention with QK layernorm.
Sparse MoE layers with grouped GEMM, all-to-all token dispatch, expert bias, and shared expert overlap.
RMSNorm, gated MLPs, RoPE, and unshared input/output embeddings.
Vocab size is padded to a multiple of 1280 for Megatron execution.
Recipes#
Kimi K2 recipes live in src/megatron/bridge/recipes/kimi/kimi_k2.py.