bridge.recipes.qwen_vl.qwen3_vl#
Qwen3-VL recipes with parameterless API.
This module provides pretrain, SFT, and PEFT configurations for Qwen3-VL models (8B, 30B-A3B, 235B-A22B).
Module Contents#
Classes#
Typed options accepted by Qwen3-VL pretrain recipe helper. |
Functions#
Create a pre-training configuration for Qwen3-VL models. |
|
Return a pre-training config for Qwen3-VL 8B Instruct. |
|
Return a pre-training config for Qwen3-VL 30B-A3B (MoE). |
|
Return a pre-training config for Qwen3-VL 235B-A22B (MoE). |
|
Create an EnergonProvider dataset config for Qwen3-VL recipes. |
|
Return a full SFT config for Qwen3-VL 8B (dense model). |
|
Return a full SFT config for Qwen3-VL 30B-A3B (MoE model). |
|
Return a full SFT config for Qwen3-VL 235B-A22B (MoE model). |
|
Return a PEFT config for Qwen3-VL 8B (dense model). |
|
Return a PEFT config for Qwen3-VL 30B-A3B (MoE model). |
|
Return a PEFT config for Qwen3-VL 235B-A22B (MoE model). |
|
Return a PEFT (LoRA/DoRA) config for Qwen3-VL 8B with Energon dataset. |
API#
- class bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs#
Bases:
typing_extensions.TypedDictTyped options accepted by Qwen3-VL pretrain recipe helper.
Initialization
Initialize self. See help(type(self)) for accurate signature.
- hf_path: str#
None
- tensor_model_parallel_size: int#
None
- pipeline_model_parallel_size: int#
None
- expert_model_parallel_size: int#
None
- context_parallel_size: int#
None
- sequence_parallel: bool#
None
- seq_length: int#
None
- train_iters: int#
None
- global_batch_size: int#
None
- micro_batch_size: int#
None
- lr: float#
None
- min_lr: float#
None
- lr_warmup_iters: int#
None
- lr_decay_iters: Optional[int]#
None
- freeze_language_model: bool#
None
- freeze_vision_model: bool#
None
- freeze_vision_projection: bool#
None
- precision_config: Optional[Union[megatron.bridge.training.mixed_precision.MixedPrecisionConfig, str]]#
None
- comm_overlap_config: Optional[megatron.bridge.training.comm_overlap.CommOverlapConfig]#
None
- moe_flex_dispatcher_backend: Optional[str]#
None
- mock: bool#
None
- bridge.recipes.qwen_vl.qwen3_vl._qwen3_vl_common(
- hf_path: str = 'Qwen/Qwen3-VL-8B-Instruct',
- *,
- tensor_model_parallel_size: int = 4,
- pipeline_model_parallel_size: int = 1,
- expert_model_parallel_size: int = 1,
- context_parallel_size: int = 1,
- sequence_parallel: bool = False,
- seq_length: int = 4096,
- train_iters: int = 300000,
- global_batch_size: int = 32,
- micro_batch_size: int = 2,
- lr: float = 0.0003,
- min_lr: float = 3e-05,
- lr_warmup_iters: int = 500,
- lr_decay_iters: Optional[int] = None,
- freeze_language_model: bool = True,
- freeze_vision_model: bool = True,
- freeze_vision_projection: bool = False,
- precision_config: Optional[Union[megatron.bridge.training.mixed_precision.MixedPrecisionConfig, str]] = 'bf16_mixed',
- comm_overlap_config: Optional[megatron.bridge.training.comm_overlap.CommOverlapConfig] = None,
- moe_flex_dispatcher_backend: Optional[str] = None,
- mock: bool = True,
Create a pre-training configuration for Qwen3-VL models.
Uses MockVLMConversationProvider by default (mock=True).
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_pretrain_mock_config(
- **user_kwargs: typing_extensions.Unpack[bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],
Return a pre-training config for Qwen3-VL 8B Instruct.
See
_qwen3_vl_commonfor the full list of parameters.
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_30b_a3b_pretrain_mock_config(
- **user_kwargs: typing_extensions.Unpack[bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],
Return a pre-training config for Qwen3-VL 30B-A3B (MoE).
See
_qwen3_vl_commonfor the full list of parameters.
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_235b_a22b_pretrain_mock_config(
- **user_kwargs: typing_extensions.Unpack[bridge.recipes.qwen_vl.qwen3_vl.Qwen3VLCommonKwargs],
Return a pre-training config for Qwen3-VL 235B-A22B (MoE).
See
_qwen3_vl_commonfor the full list of parameters.
- bridge.recipes.qwen_vl.qwen3_vl._make_energon_dataset(
- hf_path: str,
- seq_length: int,
- micro_batch_size: int,
- global_batch_size: int,
Create an EnergonProvider dataset config for Qwen3-VL recipes.
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_sft_config() megatron.bridge.training.config.ConfigContainer#
Return a full SFT config for Qwen3-VL 8B (dense model).
Default configuration: 1 node, 8 GPUs
TP=2, PP=1
LR=5e-6 (full SFT)
Sequence length: 4096
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_30b_a3b_sft_config() megatron.bridge.training.config.ConfigContainer#
Return a full SFT config for Qwen3-VL 30B-A3B (MoE model).
Default configuration: 4 nodes, 32 GPUs
TP=1, PP=1, EP=8
LR=5e-6 (full SFT)
Sequence length: 4096
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_235b_a22b_sft_config() megatron.bridge.training.config.ConfigContainer#
Return a full SFT config for Qwen3-VL 235B-A22B (MoE model).
Default configuration: 64 nodes, 512 GPUs
TP=4, PP=1, EP=32
LR=5e-6 (full SFT)
Sequence length: 4096
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
Return a PEFT config for Qwen3-VL 8B (dense model).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_30b_a3b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
Return a PEFT config for Qwen3-VL 30B-A3B (MoE model).
Default configuration: 1 node, 8 GPUs
TP=1, PP=1, EP=4
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_235b_a22b_peft_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
Return a PEFT config for Qwen3-VL 235B-A22B (MoE model).
Default configuration: 8 nodes, 64 GPUs
TP=1, PP=1, EP=16
LR=1e-4 (PEFT)
Sequence length: 4096
- Parameters:
peft_scheme – PEFT scheme - “lora”, “dora”, or a custom PEFT instance.
- bridge.recipes.qwen_vl.qwen3_vl.qwen3_vl_8b_peft_energon_config(
- peft_scheme: str | megatron.bridge.peft.base.PEFT = 'lora',
Return a PEFT (LoRA/DoRA) config for Qwen3-VL 8B with Energon dataset.
Same as qwen3_vl_8b_peft_config but uses EnergonProvider instead of HF dataset. Set the dataset path via CLI override: dataset.path=/path/to/energon/dataset