bridge.recipes.qwen.qwen3_next#

Module Contents#

Functions#

qwen3_next_80b_a3b_pretrain_config

Return a pre-training config for Qwen3-Next 80B-A3B.

qwen3_next_80b_a3b_sft_config

Return a full SFT config for Qwen3-Next 80B-A3B.

qwen3_next_80b_a3b_peft_config

Return a PEFT config for Qwen3-Next 80B-A3B.

API#

bridge.recipes.qwen.qwen3_next.qwen3_next_80b_a3b_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3-Next 80B-A3B.

Recommended parallelism: TP=1, PP=4, EP=8. Note: Qwen3-Next supports Multi-Token Prediction (MTP) with mtp_num_layers and mtp_loss_scaling_factor.

bridge.recipes.qwen.qwen3_next.qwen3_next_80b_a3b_sft_config() megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3-Next 80B-A3B.

Recommended parallelism: TP=1, PP=2, EP=8 Note: Packed sequence is NOT supported for Qwen3-Next. Note: Qwen3-Next uses no_weight_decay_cond_type = “qwen3_next” for scheduler.

Returns:

ConfigContainer with all settings pre-configured for Qwen3-Next 80B-A3B SFT.

bridge.recipes.qwen.qwen3_next.qwen3_next_80b_a3b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3-Next 80B-A3B.

Note: PEFT is NOT currently supported for Qwen3-Next models. This function raises NotImplementedError.

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

Raises:

NotImplementedError – PEFT is not supported for Qwen3-Next models.