bridge.models.qwen_vl.qwen3_vl_bridge#

Module Contents#

Classes#

Qwen3VLBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

Qwen3VLMoEBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

ExpertMLPDownProjMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

ExpertMLPGateUpProjMapping

Mapping for expert MLP gate and up projection weights between HF and Megatron formats.

API#

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLForConditionalGeneration and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for vision-language models.

The weight mappings are based on the yan-mbridge implementation which defines:

  • Vision model direct mappings

  • Vision attention layer mappings

  • Vision MLP layer mappings

  • Language model mappings

  • Deepstack visual merger mappings

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-8B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLModelProvider#

Create a Qwen3VLModelProvider from a HuggingFace pretrained model.

Parameters:

hf_pretrained – HuggingFace pretrained VLM model

Returns:

Qwen3VLModelProvider configured with the HF model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings from Megatron to HF format.

The mappings are organized into:

  1. Simple 1:1 mappings for embeddings, layer norms, and output layers

  2. Vision model mappings (replicated without modification)

  3. QKV mappings that combine separate Q, K, V matrices

  4. Gated MLP mappings that combine gate and up projections

  5. Deepstack visual merger mappings

Returns:

MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLMoEForConditionalGeneration and Megatron-Core Qwen3VL MoE model formats, including weight mappings and configuration translation for vision-language MoE models.

The weight mappings handle:

  • Vision model weights (same as dense model)

  • Language model MoE layers with expert routing

  • Shared embeddings and output layers

  • QK layernorm specific to Qwen3 architecture

This bridge works with any Qwen3VL MoE model size and automatically extracts the MoE configuration from the HuggingFace model.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-30B-A3B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider#

Create a Qwen3VLMoEModelProvider from a HuggingFace pretrained MoE model.

Parameters:

hf_pretrained – HuggingFace pretrained VLM MoE model

Returns:

Qwen3VLMoEModelProvider configured with the HF MoE model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for MoE models.

The MoE mappings include:

  1. Standard language model mappings (embeddings, layer norms, output)

  2. Vision model mappings (same as dense model)

  3. QKV mappings with QK layernorm

  4. MoE-specific mappings:

    • Router weights for expert selection

    • Expert MLPs (multiple experts per layer)

    • Pre-MLP layernorm

  5. Deepstack visual merger mappings

Returns:

MegatronMappingRegistry with all MoE parameter mappings

class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPDownProjMapping#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
_validate_patterns(*args, **kwargs)#
class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPGateUpProjMapping#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP gate and up projection weights between HF and Megatron formats.

hf_to_megatron(
hf_weights: Union[torch.Tensor, Dict],
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
_validate_patterns(*args, **kwargs)#