bridge.recipes.qwen_vl.qwen35_vl#

Qwen3.5-VL finetuning recipes.

This module provides SFT and PEFT configurations for Qwen3.5-VL models:

  • Dense: 800M, 2B, 4B, 9B, 27B

  • MoE: 35B-A3B, 122B-A10B, 397B-A17B

Module Contents#

Functions#

_qwen35_vl_apply_common

Apply settings shared across all Qwen3.5-VL SFT and PEFT recipes.

_qwen35_vl_apply_moe

Apply MoE-specific settings on top of the common configuration.

_qwen35_vl_enable_recompute

Enable activation recomputation for large models.

_qwen35_vl_apply_peft_scheme

Resolve and apply the PEFT scheme (LoRA, DoRA, or custom).

qwen35_vl_800m_sft_config

Return a full SFT config for Qwen3.5-VL 800M (dense).

qwen35_vl_2b_sft_config

Return a full SFT config for Qwen3.5-VL 2B (dense).

qwen35_vl_4b_sft_config

Return a full SFT config for Qwen3.5-VL 4B (dense).

qwen35_vl_9b_sft_config

Return a full SFT config for Qwen3.5-VL 9B (dense).

qwen35_vl_27b_sft_config

Return a full SFT config for Qwen3.5-VL 27B (dense).

qwen35_vl_35b_a3b_sft_config

Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE).

qwen35_vl_122b_a10b_sft_config

Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).

qwen35_vl_397b_a17b_sft_config

Return a full SFT config for Qwen3.5-VL 397B-A17B (MoE).

qwen35_vl_800m_peft_config

Return a PEFT config for Qwen3.5-VL 800M (dense).

qwen35_vl_2b_peft_config

Return a PEFT config for Qwen3.5-VL 2B (dense).

qwen35_vl_4b_peft_config

Return a PEFT config for Qwen3.5-VL 4B (dense).

qwen35_vl_9b_peft_config

Return a PEFT config for Qwen3.5-VL 9B (dense).

qwen35_vl_27b_peft_config

Return a PEFT config for Qwen3.5-VL 27B (dense).

qwen35_vl_35b_a3b_peft_config

Return a PEFT config for Qwen3.5-VL 35B-A3B (MoE).

qwen35_vl_122b_a10b_peft_config

Return a PEFT config for Qwen3.5-VL 122B-A10B (MoE).

qwen35_vl_397b_a17b_peft_config

Return a PEFT config for Qwen3.5-VL 397B-A17B (MoE).

API#

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_common(
cfg: megatron.bridge.training.config.ConfigContainer,
hf_path: str,
*,
tp: int,
pp: int,
max_lr: float,
min_lr: float,
gbs: int = 32,
) None#

Apply settings shared across all Qwen3.5-VL SFT and PEFT recipes.

Sets model, parallelism (except EP/SP for MoE), VLM freeze, MTP, TE, CUDA graphs, kernels, memory-saving defaults, training, optimizer, dataset, DDP, and mixed-precision options.

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_moe(
cfg: megatron.bridge.training.config.ConfigContainer,
*,
ep: int,
etp: int = 1,
) None#

Apply MoE-specific settings on top of the common configuration.

Enables expert parallelism, sequence parallelism, MoE kernels, and sets MoE-specific overlap / balance / FP8-padding defaults.

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_enable_recompute(
cfg: megatron.bridge.training.config.ConfigContainer,
) None#

Enable activation recomputation for large models.

bridge.recipes.qwen_vl.qwen35_vl._qwen35_vl_apply_peft_scheme(
cfg: megatron.bridge.training.config.ConfigContainer,
peft_scheme: str | megatron.bridge.peft.base.PEFT,
) None#

Resolve and apply the PEFT scheme (LoRA, DoRA, or custom).

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_800m_sft_config(
hf_path: str = 'Qwen/Qwen3.5-0.8B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 800M (dense).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

Note: num_kv_heads=2, so max TP=2.

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_2b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-2B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 2B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

Note: num_kv_heads=2, so max TP=2.

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_4b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-4B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 4B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=2, PP=1

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

Note: num_kv_heads=4, so max TP=4.

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-9B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 9B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=4, PP=1

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

Note: num_kv_heads=4, so max TP=4.

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_27b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-27B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 27B (dense).

Default configuration: 2 nodes, 16 GPUs total

  • TP=4, PP=4

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-35B-A3B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 35B-A3B (MoE).

Default configuration: 2 nodes, 16 GPUs

  • TP=2, PP=1, EP=16

  • LR=2e-5 (full SFT)

  • Sequence length: 4096

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-122B-A10B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 122B-A10B (MoE).

Default configuration: 4 nodes, 32 GPUs

  • TP=2, PP=6, EP=8

  • LR=2e-5 (full SFT)

  • Sequence length: 4096

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_sft_config(
hf_path: str = 'Qwen/Qwen3.5-397B-A17B',
) megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3.5-VL 397B-A17B (MoE).

Default configuration: 16 nodes, 128 GPUs

  • TP=2, PP=4, EP=32

  • LR=2e-5 (full SFT)

  • Sequence length: 4096

Parameters:

hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_800m_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-0.8B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 800M (dense).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_2b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-2B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 2B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_4b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-4B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 4B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_9b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-9B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 9B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_27b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-27B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 27B (dense).

Default configuration: 1 node, 8 GPUs

  • TP=2, PP=1

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_35b_a3b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-35B-A3B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 35B-A3B (MoE).

Default configuration: 1 node, 8 GPUs

  • TP=2, PP=1, EP=4

  • LR=2e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_122b_a10b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-122B-A10B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 122B-A10B (MoE).

Default configuration: 2 nodes, 16 GPUs

  • TP=2, PP=1, EP=8

  • LR=2e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.

bridge.recipes.qwen_vl.qwen35_vl.qwen35_vl_397b_a17b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
hf_path: str = 'Qwen/Qwen3.5-397B-A17B',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3.5-VL 397B-A17B (MoE).

Default configuration: 4 nodes, 32 GPUs

  • TP=2, PP=1, EP=32

  • LR=2e-4 (PEFT)

  • Sequence length: 4096

Parameters:
  • peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

  • hf_path – HuggingFace model ID or local path to model directory.