bridge.models.qwen_vl.qwen3_vl_bridge#
Module Contents#
Classes#
Megatron Bridge for Qwen3-VL Conditional Generation. |
|
Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation. |
|
Mapping for expert MLP down projection weights between HF and Megatron formats. |
|
Mapping for expert MLP gate and up projection weights between HF and Megatron formats. |
API#
- class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Qwen3-VL Conditional Generation.
This bridge handles the conversion between HuggingFace Qwen3VLForConditionalGeneration and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for vision-language models.
The weight mappings are based on the yan-mbridge implementation which defines:
Vision model direct mappings
Vision attention layer mappings
Vision MLP layer mappings
Language model mappings
Deepstack visual merger mappings
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-8B-Instruct”) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
Create a Qwen3VLModelProvider from a HuggingFace pretrained model.
- Parameters:
hf_pretrained – HuggingFace pretrained VLM model
- Returns:
Qwen3VLModelProvider configured with the HF model’s parameters
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings from Megatron to HF format.
The mappings are organized into:
Simple 1:1 mappings for embeddings, layer norms, and output layers
Vision model mappings (replicated without modification)
QKV mappings that combine separate Q, K, V matrices
Gated MLP mappings that combine gate and up projections
Deepstack visual merger mappings
- Returns:
MegatronMappingRegistry with all parameter mappings
- class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.
This bridge handles the conversion between HuggingFace Qwen3VLMoEForConditionalGeneration and Megatron-Core Qwen3VL MoE model formats, including weight mappings and configuration translation for vision-language MoE models.
The weight mappings handle:
Vision model weights (same as dense model)
Language model MoE layers with expert routing
Shared embeddings and output layers
QK layernorm specific to Qwen3 architecture
This bridge works with any Qwen3VL MoE model size and automatically extracts the MoE configuration from the HuggingFace model.
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-30B-A3B-Instruct”) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
Create a Qwen3VLMoEModelProvider from a HuggingFace pretrained MoE model.
- Parameters:
hf_pretrained – HuggingFace pretrained VLM MoE model
- Returns:
Qwen3VLMoEModelProvider configured with the HF MoE model’s parameters
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings for MoE models.
The MoE mappings include:
Standard language model mappings (embeddings, layer norms, output)
Vision model mappings (same as dense model)
QKV mappings with QK layernorm
MoE-specific mappings:
Router weights for expert selection
Expert MLPs (multiple experts per layer)
Pre-MLP layernorm
Deepstack visual merger mappings
- Returns:
MegatronMappingRegistry with all MoE parameter mappings
- class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPDownProjMapping#
Bases:
megatron.bridge.models.conversion.param_mapping.AutoMappingMapping for expert MLP down projection weights between HF and Megatron formats.
- hf_to_megatron(
- hf_weights: torch.Tensor,
- megatron_module: torch.nn.Module,
- megatron_to_hf(
- megatron_weights: torch.Tensor,
- megatron_module: torch.nn.Module,
- _validate_patterns(*args, **kwargs)#
- class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPGateUpProjMapping#
Bases:
megatron.bridge.models.conversion.param_mapping.AutoMappingMapping for expert MLP gate and up projection weights between HF and Megatron formats.
- hf_to_megatron(
- hf_weights: Union[torch.Tensor, Dict],
- megatron_module: torch.nn.Module,
- megatron_to_hf(
- megatron_weights: torch.Tensor,
- megatron_module: torch.nn.Module,
- _validate_patterns(*args, **kwargs)#