bridge.models.qwen_vl.qwen35_vl_bridge#

Megatron Bridges for Qwen3.5 Vision-Language Models.

Qwen3.5 is a family of multimodal models that combine:

  • A hybrid Gated DeltaNet + Gated Attention language model (like Qwen3-Next)

  • A vision encoder (similar to Qwen3-VL)

  • Dense MLP or Mixture of Experts (MoE) with shared experts

This module provides two bridges:

  • Qwen35VLBridge: Dense variant (e.g., Qwen3.5-27B) Reference: https://huggingface.co/Qwen/Qwen3.5-27B

  • Qwen35VLMoEBridge: MoE variant (e.g., Qwen3.5-397B-A17B) Reference: https://huggingface.co/Qwen/Qwen3.5-397B-A17B

Module Contents#

Classes#

Qwen35VLMoEBridge

Megatron Bridge for Qwen3.5 Vision-Language Model.

Qwen35VLBridge

Megatron Bridge for Qwen3.5 Dense Vision-Language Model.

Data#

API#

bridge.models.qwen_vl.qwen35_vl_bridge.logger#

‘getLogger(
)’

bridge.models.qwen_vl.qwen35_vl_bridge._QWEN3_5_DENSE_HF_CLASS_NAME#

‘Qwen3_5ForConditionalGeneration’

bridge.models.qwen_vl.qwen35_vl_bridge._QWEN3_5_MOE_HF_CLASS_NAME#

‘Qwen3_5MoeForConditionalGeneration’

class bridge.models.qwen_vl.qwen35_vl_bridge.Qwen35VLMoEBridge#

Bases: megatron.bridge.models.qwen_vl.qwen3_vl_bridge.Qwen3VLMoEBridge

Megatron Bridge for Qwen3.5 Vision-Language Model.

This bridge handles the conversion between HuggingFace Qwen3.5 VL model and Megatron-Core Qwen3VLModel formats, including weight mappings and configuration translation for the hybrid GDN+Attention VLM architecture.

The weight mappings handle:

  • Language model hybrid layers (GDN + standard attention)

  • MoE layers with routed and shared experts

  • Vision model weights (same as Qwen3-VL: deepstack, merger, patch embed)

  • QK layernorm, zero-centered RMSNorm for GDN output norm

  • mRoPE position embeddings

Architecture: 15 × (3 × (GDN → MoE) + 1 × (Attention → MoE)) = 60 layers

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3.5-397B-A17B”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen35_vl_provider.Qwen35VLMoEModelProvider#

Create a Qwen35VLMoEModelProvider from a HuggingFace pretrained model.

Extracts both language model and vision model configurations from the HuggingFace config and maps them to Megatron provider parameters.

Parameters:

hf_pretrained – HuggingFace pretrained VLM model

Returns:

Qwen35VLMoEModelProvider configured with the HF model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for Qwen3.5 VL.

Combines:

  1. Language model mappings (Qwen3-Next hybrid architecture with VL prefixes):

    • Standard attention: QKV, output projection, QK layernorm

    • Linear attention (GDN): in_proj, out_proj, conv1d, A_log, dt_bias, out_norm

    • MoE: router, routed expert MLPs, shared expert MLPs, shared expert gate

    • Embeddings, output layer, final layernorm

  2. Vision model mappings (Qwen3-VL style):

    • Vision transformer blocks: attention, MLP, layer norms

    • Deepstack visual mergers

    • Patch embedding and position embedding

    • Final merger (patch_norm, linear_fc1, linear_fc2)

Naming Convention:

  • Megatron language model params are prefixed with “language_model.”

  • HF language model params are prefixed with “model.language_model.”

  • Megatron vision model params are prefixed with “vision_model.”

  • HF vision model params are prefixed with “model.visual.”

Returns:

MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen_vl.qwen35_vl_bridge.Qwen35VLBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Dense Vision-Language Model.

This bridge handles the conversion between HuggingFace Qwen3.5 dense VL model and Megatron-Core Qwen3VLModel formats. Unlike the MoE variant, this model uses a standard dense MLP (gate_proj + up_proj → linear_fc1, down_proj → linear_fc2).

The weight mappings handle:

  • Language model hybrid layers (GDN + standard attention)

  • Dense MLP with gated SiLU activation (fused pre-MLP layernorm)

  • Vision model weights (no deepstack mergers)

  • QK layernorm, zero-centered RMSNorm for GDN output norm

  • mRoPE position embeddings

Architecture (27B): 16 × (3 × GDN + 1 × Attention) = 64 layers

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen3.5-27B”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_vl.qwen35_vl_provider.Qwen35VLModelProvider#

Create a Qwen35VLModelProvider from a HuggingFace pretrained model.

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry for Qwen3.5 dense VL model.

Key differences from the MoE variant:

  • Dense MLP: gate_proj + up_proj fused into linear_fc1, down_proj as linear_fc2

  • Pre-MLP layernorm fused into mlp.linear_fc1 (not a separate pre_mlp_layernorm)

  • No MoE router, routed expert MLPs, or shared expert mappings

  • No deepstack visual mergers (deepstack_visual_indexes is empty)