bridge.recipes.qwen_vl.qwen3_vl#

Qwen3-VL recipes with parameterless API.

This module provides pretrain, SFT, and PEFT configurations for Qwen3-VL models (8B, 30B-A3B, 235B-A22B).

Module Contents#

Classes#

Qwen3VLCommonKwargs

Typed options accepted by Qwen3-VL pretrain recipe helper.

Functions#

_qwen3_vl_common

Create a pre-training configuration for Qwen3-VL models.

qwen3_vl_8b_pretrain_mock_config

Return a pre-training config for Qwen3-VL 8B Instruct.

qwen3_vl_30b_a3b_pretrain_mock_config

Return a pre-training config for Qwen3-VL 30B-A3B (MoE).

qwen3_vl_235b_a22b_pretrain_mock_config

Return a pre-training config for Qwen3-VL 235B-A22B (MoE).

_make_energon_dataset

Create an EnergonProvider dataset config for Qwen3-VL recipes.

qwen3_vl_8b_sft_config

Return a full SFT config for Qwen3-VL 8B (dense model).

qwen3_vl_30b_a3b_sft_config

Return a full SFT config for Qwen3-VL 30B-A3B (MoE model).

qwen3_vl_235b_a22b_sft_config

Return a full SFT config for Qwen3-VL 235B-A22B (MoE model).

qwen3_vl_8b_peft_config

Return a PEFT config for Qwen3-VL 8B (dense model).

qwen3_vl_30b_a3b_peft_config

Return a PEFT config for Qwen3-VL 30B-A3B (MoE model).

qwen3_vl_235b_a22b_peft_config

Return a PEFT config for Qwen3-VL 235B-A22B (MoE model).

qwen3_vl_8b_peft_energon_config

Return a PEFT (LoRA/DoRA) config for Qwen3-VL 8B with Energon dataset.

API#

class bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs#

Bases: typing_extensions.TypedDict

Typed options accepted by Qwen3-VL pretrain recipe helper.

Initialization

Initialize self. See help(type(self)) for accurate signature.

hf_path: str#

None

tensor_model_parallel_size: int#

None

pipeline_model_parallel_size: int#

None

expert_model_parallel_size: int#

None

context_parallel_size: int#

None

sequence_parallel: bool#

None

seq_length: int#

None

train_iters: int#

None

global_batch_size: int#

None

micro_batch_size: int#

None

lr: float#

None

min_lr: float#

None

lr_warmup_iters: int#

None

lr_decay_iters: Optional[int]#

None

freeze_language_model: bool#

None

freeze_vision_model: bool#

None

freeze_vision_projection: bool#

None

precision_config: Optional[Union[megatron.bridge.training.mixed_precision.MixedPrecisionConfig, str]]#

None

comm_overlap_config: Optional[megatron.bridge.training.comm_overlap.CommOverlapConfig]#

None

moe_flex_dispatcher_backend: Optional[str]#

None

mock: bool#

None

bridge.recipes.qwen_vl.qwen3_vl._qwen3_vl_common(
hf_path: str = 'Qwen/Qwen3-VL-8B-Instruct',
*,
tensor_model_parallel_size: int = 4,
pipeline_model_parallel_size: int = 1,
expert_model_parallel_size: int = 1,
context_parallel_size: int = 1,
sequence_parallel: bool = False,
seq_length: int = 4096,
train_iters: int = 300000,
global_batch_size: int = 32,
micro_batch_size: int = 2,
lr: float = 0.0003,
min_lr: float = 3e-05,
lr_warmup_iters: int = 500,
lr_decay_iters: Optional[int] = None,
freeze_language_model: bool = True,
freeze_vision_model: bool = True,
freeze_vision_projection: bool = False,
precision_config: Optional[Union[megatron.bridge.training.mixed_precision.MixedPrecisionConfig, str]] = 'bf16_mixed',
comm_overlap_config: Optional[megatron.bridge.training.comm_overlap.CommOverlapConfig] = None,
moe_flex_dispatcher_backend: Optional[str] = None,
mock: bool = True,
) megatron.bridge.training.config.ConfigContainer#

Create a pre-training configuration for Qwen3-VL models.

Uses MockVLMConversationProvider by default (mock=True).

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_pretrain_mock_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3-VL 8B Instruct.

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_30b_a3b_pretrain_mock_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3-VL 30B-A3B (MoE).

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_235b_a22b_pretrain_mock_config(
**user_kwargs: typing_extensions.Unpack[bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],
) megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for Qwen3-VL 235B-A22B (MoE).

See _qwen3_vl_common for the full list of parameters.

bridge.recipes.qwen_vl.qwen3_vl._make_energon_dataset(
hf_path: str,
seq_length: int,
micro_batch_size: int,
global_batch_size: int,
) megatron.bridge.data.energon.energon_provider.EnergonProvider#

Create an EnergonProvider dataset config for Qwen3-VL recipes.

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_sft_config() megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3-VL 8B (dense model).

Default configuration: 1 node, 8 GPUs

  • TP=2, PP=1

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_30b_a3b_sft_config() megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3-VL 30B-A3B (MoE model).

Default configuration: 4 nodes, 32 GPUs

  • TP=1, PP=1, EP=8

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_235b_a22b_sft_config() megatron.bridge.training.config.ConfigContainer#

Return a full SFT config for Qwen3-VL 235B-A22B (MoE model).

Default configuration: 64 nodes, 512 GPUs

  • TP=4, PP=1, EP=32

  • LR=5e-6 (full SFT)

  • Sequence length: 4096

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3-VL 8B (dense model).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_30b_a3b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3-VL 30B-A3B (MoE model).

Default configuration: 1 node, 8 GPUs

  • TP=1, PP=1, EP=4

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_235b_a22b_peft_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT config for Qwen3-VL 235B-A22B (MoE model).

Default configuration: 8 nodes, 64 GPUs

  • TP=1, PP=1, EP=16

  • LR=1e-4 (PEFT)

  • Sequence length: 4096

Parameters:

peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.

bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_peft_energon_config(
peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
) megatron.bridge.training.config.ConfigContainer#

Return a PEFT (LoRA/DoRA) config for Qwen3-VL 8B with Energon dataset.

Same as qwen3_vl_8b_peft_config but uses EnergonProvider instead of HF dataset. Set the dataset path via CLI override: dataset.path=/path/to/energon/dataset