bridge.recipes.qwen.qwen3_moe#

Module Contents#

Functions#

qwen3_30b_a3b_pretrain_config

Return a pre-training config for Qwen3-30B-A3B MoE.

qwen3_235b_a22b_pretrain_config

Return a pre-training config for Qwen3-235B-A22B MoE.

qwen3_30b_a3b_sft_config

Return a full SFT config for Qwen3-30B-A3B MoE.

qwen3_235b_a22b_sft_config

Return a full SFT config for Qwen3-235B-A22B MoE.

qwen3_30b_a3b_peft_config

Return a PEFT config for Qwen3-30B-A3B MoE.

qwen3_235b_a22b_peft_config

Return a PEFT config for Qwen3-235B-A22B MoE.

API#

bridge.recipes.qwen.qwen3_moe.qwen3_30b_a3b_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3-30B-A3B MoE.

Recommended parallelism: TP=4, PP=2, EP=4.

bridge.recipes.qwen.qwen3_moe.qwen3_235b_a22b_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3-235B-A22B MoE.

Recommended parallelism: TP=4, PP=16, CP=2, EP=8. Note: Uses account_for_embedding_in_pipeline_split and account_for_loss_in_pipeline_split for proper layer distribution in pipeline parallelism.

bridge.recipes.qwen.qwen3_moe.qwen3_30b_a3b_sft_config() megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3-30B-A3B MoE.

Recommended parallelism: TP=4, PP=2, EP=4 (1 node, 8 GPUs with SP=True)

bridge.recipes.qwen.qwen3_moe.qwen3_235b_a22b_sft_config() megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3-235B-A22B MoE.

Recommended parallelism: TP=4, PP=16, EP=4 (8 nodes, 64 GPUs with SP=True) Uses account_for_embedding_in_pipeline_split and account_for_loss_in_pipeline_split.

bridge.recipes.qwen.qwen3_moe.qwen3_30b_a3b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3-30B-A3B MoE.

Parameters:

peft_scheme – PEFT scheme - ‘lora’, ‘dora’, or a PEFT instance. Default: ‘lora’

Recommended parallelism: TP=4, PP=1, EP=4 (1 node, 8 GPUs with SP=True) LoRA/DoRA uses dim=8, alpha=16, target_modules=[‘linear_qkv’, ‘linear_proj’]

bridge.recipes.qwen.qwen3_moe.qwen3_235b_a22b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3-235B-A22B MoE.

Parameters:

peft_scheme – PEFT scheme - ‘lora’, ‘dora’, or a PEFT instance. Default: ‘lora’

Recommended parallelism: TP=4, PP=4, EP=4 (8 nodes, 64 GPUs with SP=True) LoRA/DoRA uses dim=8, alpha=16, target_modules=[‘linear_qkv’, ‘linear_proj’] Uses account_for_embedding_in_pipeline_split and account_for_loss_in_pipeline_split.