Qwen3-Next#
Qwen3-Next is a Qwen3 MoE variant with Gated-Delta Networks and Multi-Token Prediction support. Megatron Bridge supports it through the Qwen3NextBridge.
Supported Variants#
Qwen3-Next-80B-A3B-Instruct: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct
Qwen3-Next-80B-A3B-Thinking: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking
Architecture Notes#
MoE model with 80B total parameters and 3B active parameters.
Includes Gated-Delta Network layers, QK layernorm, Zero-Centered RMSNorm, and MTP.
Recipes currently cover pretraining and full SFT; PEFT is not available for Qwen3-Next.
Examples#
The Qwen3-Next example entrypoint supports YAML and CLI overrides for full finetuning:
Recipes#
qwen3_next_80b_a3b_pretrain_configqwen3_next_80b_a3b_sft_config