bridge.models.qwen_vl.qwen3_vl_bridge#

Module Contents#

Classes#

Qwen3VLBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

Qwen3VLMoEBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

ExpertMLPDownProjMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

ExpertMLPGateUpProjMapping

Mapping for expert MLP gate+up projection using shared GatedMLPMapping logic.

API#

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLForConditionalGeneration and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for vision-language models.

The weight mappings are based on the yan-mbridge implementation which defines:

  • Vision model direct mappings

  • Vision attention layer mappings

  • Vision MLP layer mappings

  • Language model mappings

  • Deepstack visual merger mappings

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-8B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLModelProvider#

Create a Qwen3VLModelProvider from a HuggingFace pretrained model.

Parameters:

hf_pretrained – HuggingFace pretrained VLM model

Returns:

Qwen3VLModelProvider configured with the HF model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings from Megatron to HF format.

The mappings are organized into:

  1. Simple 1:1 mappings for embeddings, layer norms, and output layers

  2. Vision model mappings (replicated without modification)

  3. QKV mappings that combine separate Q, K, V matrices

  4. Gated MLP mappings that combine gate and up projections

  5. Deepstack visual merger mappings

Returns:

MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLMoEForConditionalGeneration and Megatron-Core Qwen3VL MoE model formats, including weight mappings and configuration translation for vision-language MoE models.

The weight mappings handle:

  • Vision model weights (same as dense model)

  • Language model MoE layers with expert routing

  • Shared embeddings and output layers

  • QK layernorm specific to Qwen3 architecture

This bridge works with any Qwen3VL MoE model size and automatically extracts the MoE configuration from the HuggingFace model.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-30B-A3B-Instruct”) provider = bridge.to_megatron_provider()

Initialization

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider#

Create a Qwen3VLMoEModelProvider from a HuggingFace pretrained MoE model.

Parameters:

hf_pretrained – HuggingFace pretrained VLM MoE model

Returns:

Qwen3VLMoEModelProvider configured with the HF MoE model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for MoE models.

The MoE mappings include:

  1. Standard language model mappings (embeddings, layer norms, output)

  2. Vision model mappings (same as dense model)

  3. QKV mappings with QK layernorm

  4. MoE-specific mappings:

    • Router weights for expert selection

    • Expert MLPs (multiple experts per layer)

    • Pre-MLP layernorm

  5. Deepstack visual merger mappings

Returns:

MegatronMappingRegistry with all MoE parameter mappings

maybe_modify_converted_hf_weight(
task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask,
converted_weights_dict: Dict[str, torch.Tensor],
hf_state_dict: Mapping[str, torch.Tensor],
) Dict[str, torch.Tensor]#
class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPDownProjMapping#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
_validate_patterns(*args, **kwargs)#
class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPGateUpProjMapping(*args, **kwargs)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP gate+up projection using shared GatedMLPMapping logic.

Initialization

hf_to_megatron(
hf_weights: Union[torch.Tensor, Dict],
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
_validate_patterns(*args, **kwargs)#