bridge.models.qwen.qwen35_bridge#

Module Contents#

Classes#

Qwen35MoEBridge

Megatron Bridge for Qwen3.5 Language Model (MoE variant).

Qwen35Bridge

Megatron Bridge for Qwen3.5 Dense Language Model.

Functions#

_apply_qwen35_common_config

Apply Qwen3.5 common LM configuration to a Megatron provider.

_apply_qwen35_moe_config

Apply Qwen3.5 MoE-specific configuration to a Megatron provider.

API#

bridge.models.qwen.qwen35_bridge._apply_qwen35_common_config(
provider: megatron.bridge.models.gpt_provider.GPTModelProvider,
text_config,
) None#

Apply Qwen3.5 common LM configuration to a Megatron provider.

Covers settings shared by both dense and MoE variants: normalization, GDN hybrid architecture, and MTP.

Parameters:
  • provider – GPTModelProvider (or subclass) to configure.

  • text_config – HuggingFace config object (or text_config for VLMs) so that language-model fields are read from the correct level.

bridge.models.qwen.qwen35_bridge._apply_qwen35_moe_config(
provider: megatron.bridge.models.gpt_provider.GPTModelProvider,
text_config,
) None#

Apply Qwen3.5 MoE-specific configuration to a Megatron provider.

Calls _apply_qwen35_common_config first, then adds MoE parameters.

Parameters:
  • provider – GPTModelProvider (or subclass) to configure.

  • text_config – HuggingFace config object (or text_config for VLMs) so that language-model fields are read from the correct level.

class bridge.models.qwen.qwen35_bridge.Qwen35MoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Language Model (MoE variant).

This bridge handles the conversion between HuggingFace Qwen3.5 language model and Megatron-Core Qwen3.5 Model formats, including weight mappings and configuration translation for the hybrid GDN+Attention LM architecture.

The weight mappings handle:

  • Language model hybrid layers (GDN + standard attention)

  • MoE layers with routed and shared experts

  • QK layernorm, zero-centered RMSNorm for GDN output norm

Architecture: 15 × (3 × (GDN → MoE) + 1 × (Attention → MoE)) = 60 layers

The VL variant (Qwen35VLMoEBridge) reuses the provider settings and LM mapping logic via the module-level helpers and static mapping methods.

.. rubric:: Example

from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“Qwen/Qwen3.5-397B-A17B”) model.save_pretrained(“./Qwen3.5-397B-A17B-LM”) tokenizer = AutoTokenizer.from_pretrained(“Qwen/Qwen3.5-397B-A17B”) tokenizer.save_pretrained(“./Qwen3.5-397B-A17B”) from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“./Qwen3.5-397B-A17B”) provider = bridge.to_megatron_provider()

static _get_moe_lm_mappings(hf_prefix='model.', megatron_prefix='')#

Get language model parameter mappings for MoE Qwen3.5.

Parameters:
  • hf_prefix – Prefix for HF param names in safetensors. Use “model.layers.” for LM and “model.language_model.layers.” for VL models.

  • megatron_prefix – Prefix for Megatron param names. Use “” for LM (default) and “language_model.” for VL models.

Returns:

List of mapping objects for the MoE LM portion.

static _get_moe_mtp_mappings(
megatron_prefix: str = '',
mtp_experts_packed: bool = False,
)#

Get MTP parameter mappings for MoE Qwen3.5.

Parameters:
  • megatron_prefix – Prefix for Megatron param names. Use “” for LM and “language_model.” for VL models.

  • mtp_experts_packed – Whether the MTP experts are packed. Qwen3.5 stores per-expert (mtp.layers.0.mlp.experts.{i}.gate_proj.weight), whereas Qwen3.6 stores packed (mtp.layers.0.mlp.experts.gate_up_proj).

Returns:

List of mapping objects for the MoE MTP portion.

provider_bridge(hf_pretrained)#

Convert HuggingFace Qwen3.5 text model config to GPTModelProvider.

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for Qwen3.5 LM.

Combines:

  • Standard attention: QKV, output projection, QK layernorm

  • Linear attention (GDN): in_proj, out_proj, conv1d, A_log, dt_bias, out_norm

  • MoE: router, routed expert MLPs, shared expert MLPs, shared expert gate

  • Embeddings, output layer, final layernorm

Naming Convention:

  • Megatron language model params are prefixed with “decoder.”

  • HF language model params are prefixed with “model.layers.*”

Returns:

MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen.qwen35_bridge.Qwen35Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Dense Language Model.

This bridge handles the conversion between HuggingFace Qwen3.5 language model and Megatron-Core Qwen3.5 Model formats, including weight mappings and configuration translation for the hybrid GDN+Attention LM architecture.

The weight mappings handle:

  • Language model hybrid layers (GDN + standard attention)

  • Dense MLP with gated SiLU activation (fused pre-MLP layernorm)

  • QK layernorm, zero-centered RMSNorm for GDN output norm

Architecture (27B): 16 Ă— (3 Ă— GDN + 1 Ă— Attention) = 64 layers

This class also serves as the base for Qwen35VLBridge (vision-language variant), which reuses the common provider settings and LM mapping logic via the static helper methods.

.. rubric:: Example

from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“Qwen/Qwen3.5-27B”) model.save_pretrained(“./Qwen3.5-27B-LM”) tokenizer = AutoTokenizer.from_pretrained(“Qwen/Qwen3.5-27B”) tokenizer.save_pretrained(“./Qwen3.5-27B-LM”) from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“./Qwen3.5-27B-LM”) provider = bridge.to_megatron_provider()

static _get_dense_lm_mappings(hf_prefix='model.', megatron_prefix='')#

Get language model parameter mappings for dense (non-MoE) Qwen3.5.

Parameters:
  • hf_prefix – Prefix for HF param names in safetensors. Use “model.layers.” for LM and “model.language_model.layers.” for VL models.

  • megatron_prefix – Prefix for Megatron param names. Use “” for LM (default) and “language_model.” for VL models.

Returns:

List of mapping objects for the dense LM portion.

static _get_dense_mtp_mappings(megatron_prefix='')#

Get MTP (Multi-Token Prediction) parameter mappings for dense Qwen3.5.

Parameters:

megatron_prefix – Prefix for Megatron param names. Use “” for LM and “language_model.” for VL models.

Returns:

List of mapping objects for the MTP portion.

provider_bridge(hf_pretrained)#

Convert HuggingFace Qwen3.5 text model config to GPTModelProvider.

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry for Qwen3.5 dense ML model.

Key differences from the MoE variant:

  • Dense MLP: gate_proj + up_proj fused into linear_fc1, down_proj as linear_fc2

  • Pre-MLP layernorm fused into mlp.linear_fc1 (not a separate pre_mlp_layernorm)

  • No MoE router, routed expert MLPs, or shared expert mappings