`bridge.recipes.qwen_vl.qwen35_vl`#

Qwen3.5-VL recipes.

This module provides pretrain, SFT, and PEFT configurations for Qwen3.5-VL models:

Dense: 800M, 2B, 4B, 9B, 27B
MoE: 35B-A3B, 122B-A10B, 397B-A17B

Module Contents#

Functions#

`_qwen35_vl_apply_common`	Apply settings shared across all Qwen3.5-VL SFT and PEFT recipes.
`_qwen35_vl_apply_moe`	Apply MoE-specific settings on top of the common configuration.
`_qwen35_vl_enable_recompute`	Enable activation recomputation for large models.
`_qwen35_vl_apply_peft_scheme`	Resolve and apply the PEFT scheme (LoRA, DoRA, or custom).
`qwen35_vl_9b_pretrain_mock_config`	Return a pre-training config for Qwen3.5-VL 9B (dense).
`qwen35_vl_35b_a3b_pretrain_mock_config`	Return a pre-training config for Qwen3.5-VL 35B-A3B (MoE).
`qwen35_vl_122b_a10b_pretrain_mock_config`	Return a pre-training config for Qwen3.5-VL 122B-A10B (MoE).
`qwen35_vl_397b_a17b_pretrain_mock_config`	Return a pre-training config for Qwen3.5-VL 397B-A17B (MoE).
`qwen35_vl_800m_sft_config`	Return a full SFT config for Qwen3.5-VL 800M (dense).
`qwen35_vl_2b_sft_config`	Return a full SFT config for Qwen3.5-VL 2B (dense).
`qwen35_vl_4b_sft_config`	Return a full SFT config for Qwen3.5-VL 4B (dense).
`qwen35_vl_9b_sft_config`	Return a full SFT config for Qwen3.5-VL 9B (dense).
`qwen35_vl_27b_sft_config`	Return a full SFT config for Qwen3.5-VL 27B (dense).
`qwen35_vl_35b_a3b_sft_config`	Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE).
`qwen35_vl_35b_a3b_fsdp_sft_config`	Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE) with Megatron FSDP.
`qwen35_vl_122b_a10b_sft_config`	Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).
`qwen35_vl_397b_a17b_sft_config`	Return a full SFT config for Qwen3.5-VL 397B-A17B (MoE).
`qwen35_vl_800m_peft_config`	Return a PEFT config for Qwen3.5-VL 800M (dense).
`qwen35_vl_2b_peft_config`	Return a PEFT config for Qwen3.5-VL 2B (dense).
`qwen35_vl_4b_peft_config`	Return a PEFT config for Qwen3.5-VL 4B (dense).
`qwen35_vl_9b_peft_config`	Return a PEFT config for Qwen3.5-VL 9B (dense).
`qwen35_vl_27b_peft_config`	Return a PEFT config for Qwen3.5-VL 27B (dense).
`qwen35_vl_35b_a3b_peft_config`	Return a PEFT config for Qwen3.5-VL 35B-A3B (MoE).
`qwen35_vl_122b_a10b_peft_config`	Return a PEFT config for Qwen3.5-VL 122B-A10B (MoE).
`qwen35_vl_397b_a17b_peft_config`	Return a PEFT config for Qwen3.5-VL 397B-A17B (MoE).

API#

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_common( cfg: megatron.bridge.training.config.ConfigContainer, hf_path: str, *, tp: int, pp: int, max_lr: float, min_lr: float, gbs: int = 32, ) → None#

Apply settings shared across all Qwen3.5-VL SFT and PEFT recipes.

Sets model, parallelism (except EP/SP for MoE), VLM freeze, MTP, TE, CUDA graphs, kernels, memory-saving defaults, training, optimizer, dataset, DDP, and mixed-precision options.

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_moe( cfg: megatron.bridge.training.config.ConfigContainer, *, ep: int, etp: int = 1, ) → None#

Apply MoE-specific settings on top of the common configuration.

Enables expert parallelism, sequence parallelism, token dispatcher, MoE kernels, and sets MoE-specific overlap / balance / FP8-padding defaults.

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_enable_recompute( cfg: megatron.bridge.training.config.ConfigContainer, ) → None#: Enable activation recomputation for large models.

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_peft_scheme( cfg: megatron.bridge.training.config.ConfigContainer, peft_scheme: str | megatron.bridge.peft.base.PEFT, ) → None#: Resolve and apply the PEFT scheme (LoRA, DoRA, or custom).

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_pretrain_mock_config(

**user_kwargs: typing_extensions.Unpack[megatron.bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],

) → megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3.5-VL 9B (dense).

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_pretrain_mock_config(

**user_kwargs: typing_extensions.Unpack[megatron.bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],

) → megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3.5-VL 35B-A3B (MoE).

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_pretrain_mock_config(

**user_kwargs: typing_extensions.Unpack[megatron.bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],

) → megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3.5-VL 122B-A10B (MoE).

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_pretrain_mock_config(

**user_kwargs: typing_extensions.Unpack[megatron.bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],

) → megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3.5-VL 397B-A17B (MoE).

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_800m_sft_config( hf_path: str = 'Qwen/Qwen3.5-0.8B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 800M (dense).

Default configuration: 1 node, 8 GPUs

TP=1, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096

Note: num_kv_heads=2, so max TP=2.

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_2b_sft_config( hf_path: str = 'Qwen/Qwen3.5-2B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 2B (dense).

Default configuration: 1 node, 8 GPUs

TP=1, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096

Note: num_kv_heads=2, so max TP=2.

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_4b_sft_config( hf_path: str = 'Qwen/Qwen3.5-4B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 4B (dense).

Default configuration: 1 node, 8 GPUs

TP=2, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096

Note: num_kv_heads=4, so max TP=4.

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_sft_config( hf_path: str = 'Qwen/Qwen3.5-9B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 9B (dense).

Default configuration: 1 node, 8 GPUs

TP=4, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096

Note: num_kv_heads=4, so max TP=4.

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_27b_sft_config( hf_path: str = 'Qwen/Qwen3.5-27B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 27B (dense).

Default configuration: 2 nodes, 16 GPUs total

TP=4, PP=4
LR=5e-6 (full SFT)
Sequence length: 4096

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_sft_config( hf_path: str = 'Qwen/Qwen3.5-35B-A3B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE).

Default configuration: 2 nodes, 16 GPUs

TP=2, PP=1, EP=16
LR=2e-5 (full SFT)
Sequence length: 4096

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_fsdp_sft_config( hf_path: str = 'Qwen/Qwen3.5-35B-A3B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE) with Megatron FSDP.

Uses Megatron FSDP for memory-efficient training with AG/RS overlap. Requires fsdp_dtensor checkpoint format (convert offline with checkpoint_inspector.py convert-torch-dist-to-fsdp-dtensor).

Default configuration: 2 nodes, 16 GPUs

TP=1, PP=1, EP=2
Megatron FSDP with double buffering
NCCL UB disabled (heterogeneous FSDP units cause hangs)
LR=2e-5 (full SFT)
Sequence length: 4096

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_sft_config( hf_path: str = 'Qwen/Qwen3.5-122B-A10B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).

Default configuration: 6 nodes, 48 GPUs

TP=2, PP=6, EP=8
LR=2e-5 (full SFT)
Sequence length: 4096

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_sft_config( hf_path: str = 'Qwen/Qwen3.5-397B-A17B', ) → megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 397B-A17B (MoE).

Default configuration: 16 nodes, 128 GPUs

TP=2, PP=4, EP=32
LR=2e-5 (full SFT)
Sequence length: 4096

Parameters:: hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_800m_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-0.8B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 800M (dense).

Default configuration: 1 node, 8 GPUs

TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_2b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-2B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 2B (dense).

Default configuration: 1 node, 8 GPUs

TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_4b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-4B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 4B (dense).

Default configuration: 1 node, 8 GPUs

TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-9B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 9B (dense).

Default configuration: 1 node, 8 GPUs

TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_27b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-27B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 27B (dense).

Default configuration: 1 node, 8 GPUs

TP=2, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-35B-A3B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 35B-A3B (MoE).

Default configuration: 1 node, 8 GPUs

TP=2, PP=1, EP=4
LR=2e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-122B-A10B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 122B-A10B (MoE).

Default configuration: 2 nodes, 16 GPUs

TP=2, PP=1, EP=8
LR=2e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_peft_config( peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora', hf_path: str = 'Qwen/Qwen3.5-397B-A17B', ) → megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 397B-A17B (MoE).

Default configuration: 4 nodes, 32 GPUs

TP=2, PP=1, EP=32
LR=2e-4 (PEFT)
Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl#

Module Contents#

Functions#

API#

`bridge.recipes.qwen_vl.qwen35_vl`#