`bridge.models.qwen_vl.qwen3_vl_bridge`#

Module Contents#

Classes#

`Qwen3VLBridge`	Megatron Bridge for Qwen3-VL Conditional Generation.
`Qwen3VLMoEBridge`	Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.
`ExpertMLPDownProjMapping`	Mapping for expert MLP down projection weights between HF and Megatron formats.
`ExpertMLPGateUpProjMapping`	Mapping for expert MLP gate and up projection weights between HF and Megatron formats.

API#

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLForConditionalGeneration and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for vision-language models.

The weight mappings are based on the yan-mbridge implementation which defines:

Vision model direct mappings
Vision attention layer mappings
Vision MLP layer mappings
Language model mappings
Deepstack visual merger mappings

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-8B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM, ) → megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLModelProvider#

Create a Qwen3VLModelProvider from a HuggingFace pretrained model.

Parameters:: hf_pretrained – HuggingFace pretrained VLM model
Returns:: Qwen3VLModelProvider configured with the HF model’s parameters

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings from Megatron to HF format.

The mappings are organized into:

Simple 1:1 mappings for embeddings, layer norms, and output layers
Vision model mappings (replicated without modification)
QKV mappings that combine separate Q, K, V matrices
Gated MLP mappings that combine gate and up projections
Deepstack visual merger mappings

Returns:: MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLMoEForConditionalGeneration and Megatron-Core Qwen3VL MoE model formats, including weight mappings and configuration translation for vision-language MoE models.

The weight mappings handle:

Vision model weights (same as dense model)
Language model MoE layers with expert routing
Shared embeddings and output layers
QK layernorm specific to Qwen3 architecture

This bridge works with any Qwen3VL MoE model size and automatically extracts the MoE configuration from the HuggingFace model.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-30B-A3B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM, ) → megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider#

Create a Qwen3VLMoEModelProvider from a HuggingFace pretrained MoE model.

Parameters:: hf_pretrained – HuggingFace pretrained VLM MoE model
Returns:: Qwen3VLMoEModelProvider configured with the HF MoE model’s parameters

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for MoE models.

The MoE mappings include:

Standard language model mappings (embeddings, layer norms, output)
Vision model mappings (same as dense model)
QKV mappings with QK layernorm
MoE-specific mappings:
- Router weights for expert selection
- Expert MLPs (multiple experts per layer)
- Pre-MLP layernorm
Deepstack visual merger mappings

Returns:: MegatronMappingRegistry with all MoE parameter mappings

class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPDownProjMapping#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

hf_to_megatron( hf_weights: torch.Tensor, megatron_module: torch.nn.Module, ) → torch.Tensor#

megatron_to_hf( megatron_weights: torch.Tensor, megatron_module: torch.nn.Module, ) → Dict[str, torch.Tensor]#

_validate_patterns(*args, **kwargs)#

class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPGateUpProjMapping#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP gate and up projection weights between HF and Megatron formats.

hf_to_megatron( hf_weights: Union[torch.Tensor, Dict], megatron_module: torch.nn.Module, ) → torch.Tensor#

megatron_to_hf( megatron_weights: torch.Tensor, megatron_module: torch.nn.Module, ) → Dict[str, torch.Tensor]#

_validate_patterns(*args, **kwargs)#

bridge.models.qwen_vl.qwen3_vl_bridge#

Module Contents#

Classes#

API#

`bridge.models.qwen_vl.qwen3_vl_bridge`#