Qwen3-MoE#

Qwen3-MoE is the sparse Mixture-of-Experts branch of the Qwen3 model family. Megatron Bridge supports Qwen3-MoE through the Qwen3MoeBridge and dedicated pretraining, SFT, and PEFT recipes.

Supported Variants#

Variant	Hugging Face ID	Notes
Qwen3-30B-A3B	`Qwen/Qwen3-30B-A3B`	30B total, 3B active
Qwen3-235B-A22B	`Qwen/Qwen3-235B-A22B`	235B total, 22B active

Architecture Notes#

Qwen3 transformer backbone with MoE feed-forward layers.
Supports grouped expert execution through Megatron expert parallelism.
Uses Qwen3-specific QK layernorm behavior.

Recipes#

Available recipe helpers include:

qwen3_30b_a3b_pretrain_config, qwen3_30b_a3b_sft_config, qwen3_30b_a3b_peft_config
qwen3_235b_a22b_pretrain_config, qwen3_235b_a22b_sft_config, qwen3_235b_a22b_peft_config

See src/megatron/bridge/recipes/qwen/qwen3_moe.py.