`bridge.models.qwen_vl.qwen3_vl_bridge`#

Module Contents#

Classes#

`Qwen3VLBridge`	Megatron Bridge for Qwen3-VL Conditional Generation.
`Qwen3VLMoEBridge`	Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

API#

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLForConditionalGeneration and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for vision-language models.

The weight mappings are based on the yan-mbridge implementation which defines:

Vision model direct mappings
Vision attention layer mappings
Vision MLP layer mappings
Language model mappings
Deepstack visual merger mappings

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-8B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM, ) → megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLModelProvider#

Create a Qwen3VLModelProvider from a HuggingFace pretrained model.

Parameters:: hf_pretrained – HuggingFace pretrained VLM model
Returns:: Qwen3VLModelProvider configured with the HF model’s parameters

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

class bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3-VL MoE (Mixture of Experts) Conditional Generation.

This bridge handles the conversion between HuggingFace Qwen3VLMoEForConditionalGeneration and Megatron-Core Qwen3VL MoE model formats, including weight mappings and configuration translation for vision-language MoE models.

The weight mappings handle:

Vision model weights (same as dense model)
Language model MoE layers with expert routing
Shared embeddings and output layers
QK layernorm specific to Qwen3 architecture

This bridge works with any Qwen3VL MoE model size and automatically extracts the MoE configuration from the HuggingFace model.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3-VL-30B-A3B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM, ) → megatron.bridge.models.qwen_vl.qwen3_vl_provider.Qwen3VLMoEModelProvider#

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for MoE models.

The MoE mappings include:

Standard language model mappings (embeddings, layer norms, output)
Vision model mappings (same as dense model)
QKV mappings with QK layernorm
MoE-specific mappings:
- Router weights for expert selection
- Expert MLPs (multiple experts per layer)
- Pre-MLP layernorm
Deepstack visual merger mappings

Returns:: MegatronMappingRegistry with all MoE parameter mappings

bridge.models.qwen_vl.qwen3_vl_bridge#

Module Contents#

Classes#

API#

`bridge.models.qwen_vl.qwen3_vl_bridge`#