bridge.recipes.qwen.qwen3_next#
Module Contents#
Functions#
Return a pre-training config for Qwen3-Next 80B-A3B. |
|
Return a full SFT config for Qwen3-Next 80B-A3B. |
|
Return a PEFT config for Qwen3-Next 80B-A3B. |
API#
- bridge.recipes.qwen.qwen3_next.qwen3_next_80b_a3b_pretrain_config() megatron.bridge.training.config.ConfigContainer#
Return a pre-training config for Qwen3-Next 80B-A3B.
Recommended parallelism: TP=1, PP=4, EP=8. Note: Qwen3-Next supports Multi-Token Prediction (MTP) with mtp_num_layers and mtp_loss_scaling_factor.
- bridge.recipes.qwen.qwen3_next.qwen3_next_80b_a3b_sft_config() megatron.bridge.training.config.ConfigContainer#
Return a full SFT config for Qwen3-Next 80B-A3B.
Recommended parallelism: TP=1, PP=2, EP=8 Note: Packed sequence is NOT supported for Qwen3-Next. Note: Qwen3-Next uses no_weight_decay_cond_type = “qwen3_next” for scheduler.
- Returns:
ConfigContainer with all settings pre-configured for Qwen3-Next 80B-A3B SFT.
- bridge.recipes.qwen.qwen3_next.qwen3_next_80b_a3b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
Return a PEFT config for Qwen3-Next 80B-A3B.
Note: PEFT is NOT currently supported for Qwen3-Next models. This function raises NotImplementedError.
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
- Raises:
NotImplementedError – PEFT is not supported for Qwen3-Next models.