bridge.models.qwen_vl.qwen35_vl_bridge#
Megatron Bridges for Qwen3.5 Vision-Language Models.
Qwen3.5 is a family of multimodal models that combine:
A hybrid Gated DeltaNet + Gated Attention language model (like Qwen3-Next)
A vision encoder (similar to Qwen3-VL)
Dense MLP or Mixture of Experts (MoE) with shared experts
This module provides two bridges:
Qwen35VLBridge: Dense variant (e.g., Qwen3.5-27B) Reference: https://huggingface.co/Qwen/Qwen3.5-27BQwen35VLMoEBridge: MoE variant (e.g., Qwen3.5-397B-A17B) Reference: https://huggingface.co/Qwen/Qwen3.5-397B-A17B
Module Contents#
Classes#
Megatron Bridge for Qwen3.5 Vision-Language Model. |
|
Megatron Bridge for Qwen3.5 Dense Vision-Language Model. |
Data#
API#
- bridge.models.qwen_vl.qwen35_vl_bridge.logger#
âgetLogger(âŠ)â
- bridge.models.qwen_vl.qwen35_vl_bridge._QWEN3_5_DENSE_HF_CLASS_NAME#
âQwen3_5ForConditionalGenerationâ
- bridge.models.qwen_vl.qwen35_vl_bridge._QWEN3_5_MOE_HF_CLASS_NAME#
âQwen3_5MoeForConditionalGenerationâ
- class bridge.models.qwen_vl.qwen35_vl_bridge.Qwen35VLMoEBridge#
Bases:
megatron.bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridgeMegatron Bridge for Qwen3.5 Vision-Language Model.
This bridge handles the conversion between HuggingFace Qwen3.5 VL model and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for the hybrid GDN+Attention VLM architecture.
The weight mappings handle:
Language model hybrid layers (GDN + standard attention)
MoE layers with routed and shared experts
Vision model weights (same as Qwen3-VL: deepstack, merger, patch embed)
QK layernorm, zero-centered RMSNorm for GDN output norm
mRoPE position embeddings
Architecture: 15 Ă (3 Ă (GDN â MoE) + 1 Ă (Attention â MoE)) = 60 layers
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(âQwen/Qwen3.5-397B-A17Bâ) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
Create a Qwen35VLMoEModelProvider from a HuggingFace pretrained model.
Extracts both language model and vision model configurations from the HuggingFace config and maps them to Megatron provider parameters.
- Parameters:
hf_pretrained â HuggingFace pretrained VLM model
- Returns:
Qwen35VLMoEModelProvider configured with the HF modelâs parameters
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings for Qwen3.5 VL.
Combines:
Language model mappings (Qwen3-Next hybrid architecture with VL prefixes):
Standard attention: QKV, output projection, QK layernorm
Linear attention (GDN): in_proj, out_proj, conv1d, A_log, dt_bias, out_norm
MoE: router, routed expert MLPs, shared expert MLPs, shared expert gate
Embeddings, output layer, final layernorm
Vision model mappings (Qwen3-VL style):
Vision transformer blocks: attention, MLP, layer norms
Deepstack visual mergers
Patch embedding and position embedding
Final merger (patch_norm, linear_fc1, linear_fc2)
Naming Convention:
Megatron language model params are prefixed with âlanguage_model.â
HF language model params are prefixed with âmodel.language_model.â
Megatron vision model params are prefixed with âvision_model.â
HF vision model params are prefixed with âmodel.visual.â
- Returns:
MegatronMappingRegistry with all parameter mappings
- class bridge.models.qwen_vl.qwen35_vl_bridge.Qwen35VLBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Qwen3.5 Dense Vision-Language Model.
This bridge handles the conversion between HuggingFace Qwen3.5 dense VL model and Megatron-Core Qwen3VLModel formats. Unlike the MoE variant, this model uses a standard dense MLP (gate_proj + up_proj â linear_fc1, down_proj â linear_fc2).
The weight mappings handle:
Language model hybrid layers (GDN + standard attention)
Dense MLP with gated SiLU activation (fused pre-MLP layernorm)
Vision model weights (no deepstack mergers)
QK layernorm, zero-centered RMSNorm for GDN output norm
mRoPE position embeddings
Architecture (27B): 16 Ă (3 Ă GDN + 1 Ă Attention) = 64 layers
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(âQwen/Qwen3.5-27Bâ) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
Create a Qwen35VLModelProvider from a HuggingFace pretrained model.
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry for Qwen3.5 dense VL model.
Key differences from the MoE variant:
Dense MLP: gate_proj + up_proj fused into linear_fc1, down_proj as linear_fc2
Pre-MLP layernorm fused into mlp.linear_fc1 (not a separate pre_mlp_layernorm)
No MoE router, routed expert MLPs, or shared expert mappings
No deepstack visual mergers (deepstack_visual_indexes is empty)