Qwen3-MoE#
Qwen3-MoE is the sparse Mixture-of-Experts branch of the Qwen3 model family. Megatron Bridge supports Qwen3-MoE through the Qwen3MoeBridge and dedicated pretraining, SFT, and PEFT recipes.
Supported Variants#
Variant |
Hugging Face ID |
Notes |
|---|---|---|
Qwen3-30B-A3B |
|
30B total, 3B active |
Qwen3-235B-A22B |
|
235B total, 22B active |
Architecture Notes#
Qwen3 transformer backbone with MoE feed-forward layers.
Supports grouped expert execution through Megatron expert parallelism.
Uses Qwen3-specific QK layernorm behavior.
Recipes#
Available recipe helpers include:
qwen3_30b_a3b_pretrain_config,qwen3_30b_a3b_sft_config,qwen3_30b_a3b_peft_configqwen3_235b_a22b_pretrain_config,qwen3_235b_a22b_sft_config,qwen3_235b_a22b_peft_config