bridge.recipes.qwen_vl.qwen35_vl#
Qwen3.5-VL finetuning recipes.
This module provides SFT and PEFT configurations for Qwen3.5-VL models:
Dense: 800M, 2B, 4B, 9B, 27B
MoE: 35B-A3B, 122B-A10B, 397B-A17B
Module Contents#
Functions#
Apply settings shared across all Qwen3.5-VL SFT and PEFT recipes. |
|
Apply MoE-specific settings on top of the common configuration. |
|
Enable activation recomputation for large models. |
|
Resolve and apply the PEFT scheme (LoRA, DoRA, or custom). |
|
Return a full SFT config for Qwen3.5-VL 800M (dense). |
|
Return a full SFT config for Qwen3.5-VL 2B (dense). |
|
Return a full SFT config for Qwen3.5-VL 4B (dense). |
|
Return a full SFT config for Qwen3.5-VL 9B (dense). |
|
Return a full SFT config for Qwen3.5-VL 27B (dense). |
|
Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE). |
|
Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE). |
|
Return a full SFT config for Qwen3.5-VL 397B-A17B (MoE). |
|
Return a PEFT config for Qwen3.5-VL 800M (dense). |
|
Return a PEFT config for Qwen3.5-VL 2B (dense). |
|
Return a PEFT config for Qwen3.5-VL 4B (dense). |
|
Return a PEFT config for Qwen3.5-VL 9B (dense). |
|
Return a PEFT config for Qwen3.5-VL 27B (dense). |
|
Return a PEFT config for Qwen3.5-VL 35B-A3B (MoE). |
|
Return a PEFT config for Qwen3.5-VL 122B-A10B (MoE). |
|
Return a PEFT config for Qwen3.5-VL 397B-A17B (MoE). |
API#
- bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_common(
- cfg: megatron.bridge.training.config.ConfigContainer,
- hf_path: str,
- *,
- tp: int,
- pp: int,
- max_lr: float,
- min_lr: float,
- gbs: int = 32,
Apply settings shared across all Qwen3.5-VL SFT and PEFT recipes.
Sets model, parallelism (except EP/SP for MoE), VLM freeze, MTP, TE, CUDA graphs, kernels, memory-saving defaults, training, optimizer, dataset, DDP, and mixed-precision options.
- bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_moe(
- cfg: megatron.bridge.training.config.ConfigContainer,
- *,
- ep: int,
- etp: int = 1,
Apply MoE-specific settings on top of the common configuration.
Enables expert parallelism, sequence parallelism, MoE kernels, and sets MoE-specific overlap / balance / FP8-padding defaults.
- bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_enable_recompute(
- cfg: megatron.bridge.training.config.ConfigContainer,
Enable activation recomputation for large models.
- bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_peft_scheme(
- cfg: megatron.bridge.training.config.ConfigContainer,
- peft_scheme: str | megatron.bridge.peft.base.PEFT,
Resolve and apply the PEFT scheme (LoRA, DoRA, or custom).
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_800m_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-0.8B',
Return a full SFT config for Qwen3.5-VL 800M (dense).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096
Note: num_kv_heads=2, so max TP=2.
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_2b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-2B',
Return a full SFT config for Qwen3.5-VL 2B (dense).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096
Note: num_kv_heads=2, so max TP=2.
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_4b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-4B',
Return a full SFT config for Qwen3.5-VL 4B (dense).
Default configuration: 1 node, 8 GPUs
TP=2, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096
Note: num_kv_heads=4, so max TP=4.
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-9B',
Return a full SFT config for Qwen3.5-VL 9B (dense).
Default configuration: 1 node, 8 GPUs
TP=4, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096
Note: num_kv_heads=4, so max TP=4.
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_27b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-27B',
Return a full SFT config for Qwen3.5-VL 27B (dense).
Default configuration: 2 nodes, 16 GPUs total
TP=4, PP=4
LR=5e-6 (full SFT)
Sequence length: 4096
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-35B-A3B',
Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE).
Default configuration: 2 nodes, 16 GPUs
TP=2, PP=1, EP=16
LR=2e-5 (full SFT)
Sequence length: 4096
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-122B-A10B',
Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).
Default configuration: 4 nodes, 32 GPUs
TP=2, PP=6, EP=8
LR=2e-5 (full SFT)
Sequence length: 4096
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_sft_config(
- hf_path: str = 'Qwen/Qwen3.5-397B-A17B',
Return a full SFT config for Qwen3.5-VL 397B-A17B (MoE).
Default configuration: 16 nodes, 128 GPUs
TP=2, PP=4, EP=32
LR=2e-5 (full SFT)
Sequence length: 4096
- Parameters:
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_800m_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-0.8B',
Return a PEFT config for Qwen3.5-VL 800M (dense).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_2b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-2B',
Return a PEFT config for Qwen3.5-VL 2B (dense).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_4b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-4B',
Return a PEFT config for Qwen3.5-VL 4B (dense).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-9B',
Return a PEFT config for Qwen3.5-VL 9B (dense).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_27b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-27B',
Return a PEFT config for Qwen3.5-VL 27B (dense).
Default configuration: 1 node, 8 GPUs
TP=2, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-35B-A3B',
Return a PEFT config for Qwen3.5-VL 35B-A3B (MoE).
Default configuration: 1 node, 8 GPUs
TP=2, PP=1, EP=4
LR=2e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-122B-A10B',
Return a PEFT config for Qwen3.5-VL 122B-A10B (MoE).
Default configuration: 2 nodes, 16 GPUs
TP=2, PP=1, EP=8
LR=2e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.
- bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
- hf_path: str = 'Qwen/Qwen3.5-397B-A17B',
Return a PEFT config for Qwen3.5-VL 397B-A17B (MoE).
Default configuration: 4 nodes, 32 GPUs
TP=2, PP=1, EP=32
LR=2e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.