Qwen3-MoE#

Qwen3-MoE is the sparse Mixture-of-Experts branch of the Qwen3 model family. Megatron Bridge supports Qwen3-MoE through the Qwen3MoeBridge and dedicated pretraining, SFT, and PEFT recipes.

Supported Variants#

Variant

Hugging Face ID

Notes

Qwen3-30B-A3B

Qwen/Qwen3-30B-A3B

30B total, 3B active

Qwen3-235B-A22B

Qwen/Qwen3-235B-A22B

235B total, 22B active

Architecture Notes#

  • Qwen3 transformer backbone with MoE feed-forward layers.

  • Supports grouped expert execution through Megatron expert parallelism.

  • Uses Qwen3-specific QK layernorm behavior.

Recipes#

Available recipe helpers include:

  • qwen3_30b_a3b_pretrain_config, qwen3_30b_a3b_sft_config, qwen3_30b_a3b_peft_config

  • qwen3_235b_a22b_pretrain_config, qwen3_235b_a22b_sft_config, qwen3_235b_a22b_peft_config

See src/megatron/bridge/recipes/qwen/qwen3_moe.py.