bridge.models.qwen_vl.qwen3_vl_bridge#

Module Contents#

Classes#

Qwen3VLBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

Qwen3VLMoEBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

ExpertMLPDownProjMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

ExpertMLPGateUpProjMapping

Mapping for expert MLP gate+up projection using shared GatedMLPMapping logic.

Functions#

_align_weight_to_shape

Auto-detect whether a transpose is needed to match the Megatron target shape.

API#

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLForConditionalGeneration and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for vision-language models.

The weight mappings are based on the yan-mbridge implementation which defines:

  • Vision model direct mappings

  • Vision attention layer mappings

  • Vision MLP layer mappings

  • Language model mappings

  • Deepstack visual merger mappings

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-8B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLModelProvider#

Create a Qwen3VLModelProvider from a HuggingFace pretrained model.

Parameters:

hf_pretrained – HuggingFace pretrained VLM model

Returns:

Qwen3VLModelProvider configured with the HF model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLMoEForConditionalGeneration and Megatron-Core Qwen3VL MoE model formats, including weight mappings and configuration translation for vision-language MoE models.

The weight mappings handle:

  • Vision model weights (same as dense model)

  • Language model MoE layers with expert routing

  • Shared embeddings and output layers

  • QK layernorm specific to Qwen3 architecture

This bridge works with any Qwen3VL MoE model size and automatically extracts the MoE configuration from the HuggingFace model.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-30B-A3B-Instruct”) provider = bridge.to_megatron_provider()

Initialization

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider#
mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for MoE models.

The MoE mappings include:

  1. Standard language model mappings (embeddings, layer norms, output)

  2. Vision model mappings (same as dense model)

  3. QKV mappings with QK layernorm

  4. MoE-specific mappings:

    • Router weights for expert selection

    • Expert MLPs (multiple experts per layer)

    • Pre-MLP layernorm

  5. Deepstack visual merger mappings

Returns:

MegatronMappingRegistry with all MoE parameter mappings

maybe_modify_converted_hf_weight(
task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask,
converted_weights_dict: Dict[str, torch.Tensor],
hf_state_dict: Mapping[str, torch.Tensor],
) Dict[str, torch.Tensor]#
bridge.models.qwen_vl.qwen3_vl_bridge._align_weight_to_shape(
weight: torch.Tensor,
target_shape: torch.Size,
name: str,
) torch.Tensor#

Auto-detect whether a transpose is needed to match the Megatron target shape.

Transformers <5.0 stored fused expert weights transposed as [num_experts, hidden_size, 2intermediate_size], while transformers 5.0+ uses the standard nn.Linear convention [num_experts, 2intermediate_size, hidden_size]. This helper accepts either layout and transposes only when necessary, so the bridge works with both real checkpoints (old format) and toy models or new checkpoints created with transformers 5.0+.

class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPDownProjMapping(*args, **kwargs)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP down projection weights between HF and Megatron formats.

Uses _align_weight_to_shape so both pre-5.0 (transposed) and 5.0+ (standard) HF expert weight layouts are handled transparently.

Initialization

hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
_validate_patterns(*args, **kwargs)#
class bridge.models.qwen_vl.qwen3_vl_bridge.ExpertMLPGateUpProjMapping(*args, **kwargs)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for expert MLP gate+up projection using shared GatedMLPMapping logic.

Uses _align_weight_to_shape so both pre-5.0 (transposed) and 5.0+ (standard) HF expert weight layouts are handled transparently.

Initialization

hf_to_megatron(
hf_weights: Union[torch.Tensor, Dict],
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
_validate_patterns(*args, **kwargs)#