`bridge.models.qwen_vl.qwen35_vl_bridge`#

Megatron Bridges for Qwen3.5 Vision-Language Models.

Qwen3.5 is a family of multimodal models that combine:

A hybrid Gated DeltaNet + Gated Attention language model (like Qwen3-Next)
A vision encoder (similar to Qwen3-VL)
Dense MLP or Mixture of Experts (MoE) with shared experts

This module provides two bridges:

Qwen35VLBridge: Dense variant (e.g., Qwen3.5-27B) Reference: https://huggingface.co/Qwen/Qwen3.5-27B
Qwen35VLMoEBridge: MoE variant (e.g., Qwen3.5-397B-A17B) Reference: https://huggingface.co/Qwen/Qwen3.5-397B-A17B

Module Contents#

Classes#

`Qwen35VLMoEBridge`	Megatron Bridge for Qwen3.5 Vision-Language Model (MoE variant).
`Qwen35VLBridge`	Megatron Bridge for Qwen3.5 Dense Vision-Language Model.

Functions#

_get_vision_mappings

Data#

`logger`
`_QWEN3_5_DENSE_HF_CLASS_NAME`
`_QWEN3_5_MOE_HF_CLASS_NAME`

API#

bridge.models.qwen_vl.qwen35_vl_bridge.logger#: ‘getLogger(…)’

bridge.models.qwen_vl.qwen35_vl_bridge._QWEN3_5_DENSE_HF_CLASS_NAME#: ‘Qwen3_5ForConditionalGeneration’

bridge.models.qwen_vl.qwen35_vl_bridge._QWEN3_5_MOE_HF_CLASS_NAME#: ‘Qwen3_5MoeForConditionalGeneration’

bridge.models.qwen_vl.qwen35_vl_bridge._get_vision_mappings()#

class bridge.models.qwen_vl.qwen35_vl_bridge.Qwen35VLMoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Vision-Language Model (MoE variant).

This bridge handles the conversion between HuggingFace Qwen3.5 VL model and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for the hybrid GDN+Attention VLM architecture.

The weight mappings handle:

Language model hybrid layers (GDN + standard attention)
MoE layers with routed and shared experts
Vision model weights (same as Qwen3-VL: deepstack, merger, patch embed)
QK layernorm, zero-centered RMSNorm for GDN output norm
mRoPE position embeddings

Architecture: 15 × (3 × (GDN → MoE) + 1 × (Attention → MoE)) = 60 layers

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3.5-397B-A17B”) provider = bridge.to_megatron_provider()

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM, ) → megatron.bridge.models.qwen_vl.qwen35_vl_provider.Qwen35VLMoEModelProvider#

Create a Qwen35VLMoEModelProvider from a HuggingFace pretrained model.

Extracts both language model and vision model configurations from the HuggingFace config and maps them to Megatron provider parameters.

Parameters:: hf_pretrained – HuggingFace pretrained VLM model
Returns:: Qwen35VLMoEModelProvider configured with the HF model’s parameters

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for Qwen3.5 VL.

Combines:

Language model mappings (Qwen3-Next hybrid architecture with VL prefixes):
- Standard attention: QKV, output projection, QK layernorm
- Linear attention (GDN): in_proj, out_proj, conv1d, A_log, dt_bias, out_norm
- MoE: router, routed expert MLPs, shared expert MLPs, shared expert gate
- Embeddings, output layer, final layernorm
Vision model mappings (Qwen3-VL style):
- Vision transformer blocks: attention, MLP, layer norms
- Deepstack visual mergers
- Patch embedding and position embedding
- Final merger (patch_norm, linear_fc1, linear_fc2)

Naming Convention:

Megatron language model params are prefixed with “language_model.”
HF language model params are prefixed with “model.language_model.”
Megatron vision model params are prefixed with “vision_model.”
HF vision model params are prefixed with “model.visual.”

Returns:: MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen_vl.qwen35_vl_bridge.Qwen35VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Dense Vision-Language Model.

This bridge handles the conversion between HuggingFace Qwen3.5 dense VL model and Megatron-Core Qwen3VLModel formats. Unlike the MoE variant, this model uses a standard dense MLP (gate_proj + up_proj → linear_fc1, down_proj → linear_fc2).

The weight mappings handle:

Language model hybrid layers (GDN + standard attention)
Dense MLP with gated SiLU activation (fused pre-MLP layernorm)
Vision model weights (no deepstack mergers)
QK layernorm, zero-centered RMSNorm for GDN output norm
mRoPE position embeddings

Architecture (27B): 16 × (3 × GDN + 1 × Attention) = 64 layers

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3.5-27B”) provider = bridge.to_megatron_provider()

mimo_source_prefixes#: None

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM, ) → megatron.bridge.models.qwen_vl.qwen35_vl_provider.Qwen35VLModelProvider#: Create a Qwen35VLModelProvider from a HuggingFace pretrained model.