bridge.models.qwen_vl.qwen3_vl_provider#
Qwen3 VL MoE Model Provider configurations for Megatron-Core.
This module provides configuration classes for Qwen3-VL MoE (Mixture of Experts) multimodal models, compatible with HuggingFace’s Qwen3-VL-MoE model configurations. Reference: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct
Module Contents#
Classes#
Base model provider for Qwen 3 VL Models. Inherits language model configuration from Qwen3ModelProvider. |
|
Base model provider for Qwen 3 VL MoE Models. Inherits language model MoE configuration from Qwen3MoEModelProvider. |
API#
- class bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLModelProvider#
Bases:
megatron.bridge.models.Qwen3ModelProviderBase model provider for Qwen 3 VL Models. Inherits language model configuration from Qwen3ModelProvider.
Note: num_query_groups in parent class corresponds to num_key_value_heads in HF config. Default value of 8 is used for GQA (Grouped Query Attention).
- head_dim: int#
128
2048
- language_max_sequence_length: int#
2048
- patch_size: int#
14
- temporal_patch_size: int#
2
- in_channels: int#
3
- spatial_merge_size: int#
2
- num_position_embeddings: int#
2304
2304
- apply_rotary_pos_emb_in_fp32: bool#
False
- deepstack_visual_indexes: List[int]#
‘field(…)’
- fp16_lm_cross_entropy: bool#
False
- rotary_percent: float#
1.0
- apply_rope_fusion: bool#
False
- vision_config: transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLVisionConfig#
‘field(…)’
- hf_text_config: Optional[transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLTextConfig]#
None
- image_token_id: int#
151655
- video_token_id: int#
151656
- vision_start_token_id: int#
151652
- vision_end_token_id: int#
151653
- bos_token_id: int#
151643
- eos_token_id: int#
151645
- position_embedding_type: str#
‘mrope’
- attention_dropout: float#
0.0
- attention_softmax_in_fp32: bool#
True
- mrope_section: List[int]#
‘field(…)’
- rotary_base: float#
5000000.0
- scatter_embedding_sequence_parallel: bool#
False
- freeze_language_model: bool#
False
- freeze_vision_model: bool#
False
- freeze_vision_projection: bool#
False
- sequence_parallel: bool#
False
- qk_layernorm: bool#
True
- provide(pre_process=None, post_process=None, vp_stage=None)#
Provide a Qwen3VL model instance with vision and language components.
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Provide just the language model component without vision.
- Parameters:
pre_process – Whether this is the first stage in pipeline parallelism
post_process – Whether this is the last stage in pipeline parallelism
vp_stage – Virtual pipeline stage number
- Returns:
MCoreGPTModel instance (language model only)
- class bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider#
Bases:
megatron.bridge.models.Qwen3MoEModelProviderBase model provider for Qwen 3 VL MoE Models. Inherits language model MoE configuration from Qwen3MoEModelProvider.
Key MoE Parameters (inherited from Qwen3MoEModelProvider):
num_moe_experts: Number of total experts (default 128)
moe_router_topk: Number of experts selected per token (default 8)
moe_router_load_balancing_type: Load balancing strategy (default “aux_loss”)
moe_aux_loss_coeff: Auxiliary loss coefficient (default 1e-3)
moe_grouped_gemm: Use grouped GEMM for efficiency (default True)
Note: num_query_groups in parent class corresponds to num_key_value_heads in HF config.
- vision_config: transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLVisionConfig#
‘field(…)’
- hf_text_config: Optional[transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeTextConfig]#
None
- pretrained_model_name: str#
‘Qwen/Qwen3-VL-30B-A3B-Instruct’
- image_token_id: int#
151655
- video_token_id: int#
151656
- vision_start_token_id: int#
151652
- vision_end_token_id: int#
151653
- bos_token_id: int#
151643
- eos_token_id: int#
151645
- head_dim: int#
128
- qk_layernorm: bool#
True
- attention_softmax_in_fp32: bool#
True
- attention_dropout: float#
0.0
- position_embedding_type: str#
‘mrope’
- mrope_section: List[int]#
‘field(…)’
- rotary_base: float#
5000000.0
- spatial_merge_size: int#
2
- temporal_patch_size: int#
2
- patch_size: int#
16
- scatter_embedding_sequence_parallel: bool#
False
- moe_router_pre_softmax: bool#
False
- moe_router_dtype: str#
‘fp32’
- moe_router_score_function: str#
‘softmax’
- moe_router_bias_update_rate: float#
0.001
- moe_permute_fusion: bool#
True
- moe_token_dispatcher_type: str#
‘alltoall’
- mlp_only_layers: List[int]#
‘field(…)’
- decoder_sparse_step: int#
1
- freeze_language_model: bool#
True
- freeze_vision_model: bool#
True
- freeze_vision_projection: bool#
False
- language_max_sequence_length: int#
2048
- persist_layer_norm: bool#
True
- bias_activation_fusion: bool#
True
- bias_dropout_fusion: bool#
True
- masked_softmax_fusion: bool#
False
- deallocate_pipeline_outputs: bool#
True
- async_tensor_model_parallel_allreduce: bool#
True
- distribute_saved_activations: bool#
False
- cp_comm_type: str#
‘p2p’
- finalize() None#
- provide(pre_process=None, post_process=None, vp_stage=None)#
Provide a Qwen3VL MoE model instance with vision and language components.
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Provide just the language MoE model component without vision.
- Parameters:
pre_process – Whether this is the first stage in pipeline parallelism
post_process – Whether this is the last stage in pipeline parallelism
vp_stage – Virtual pipeline stage number
- Returns:
MCoreGPTModel instance (MoE language model only)