`bridge.models.qwen.qwen35_bridge`#

Module Contents#

Classes#

`Qwen35MoEBridge`	Megatron Bridge for Qwen3.5 Language Model (MoE variant).
`Qwen35Bridge`	Megatron Bridge for Qwen3.5 Dense Language Model.

Functions#

`_apply_qwen35_common_config`	Apply Qwen3.5 common LM configuration to a Megatron provider.
`_apply_qwen35_moe_config`	Apply Qwen3.5 MoE-specific configuration to a Megatron provider.

API#

bridge.models.qwen.qwen35_bridge._apply_qwen35_common_config( provider: megatron.bridge.models.gpt_provider.GPTModelProvider, text_config, ) → None#

Apply Qwen3.5 common LM configuration to a Megatron provider.

Covers settings shared by both dense and MoE variants: normalization, GDN hybrid architecture, and MTP.

Parameters:

provider – GPTModelProvider (or subclass) to configure.
text_config – HuggingFace config object (or text_config for VLMs) so that language-model fields are read from the correct level.

bridge.models.qwen.qwen35_bridge._apply_qwen35_moe_config( provider: megatron.bridge.models.gpt_provider.GPTModelProvider, text_config, ) → None#

Apply Qwen3.5 MoE-specific configuration to a Megatron provider.

Calls _apply_qwen35_common_config first, then adds MoE parameters.

Parameters:

provider – GPTModelProvider (or subclass) to configure.
text_config – HuggingFace config object (or text_config for VLMs) so that language-model fields are read from the correct level.

class bridge.models.qwen.qwen35_bridge.Qwen35MoEBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Language Model (MoE variant).

This bridge handles the conversion between HuggingFace Qwen3.5 language model and Megatron-Core Qwen3.5 Model formats, including weight mappings and configuration translation for the hybrid GDN+Attention LM architecture.

The weight mappings handle:

Language model hybrid layers (GDN + standard attention)
MoE layers with routed and shared experts
QK layernorm, zero-centered RMSNorm for GDN output norm

Architecture: 15 × (3 × (GDN → MoE) + 1 × (Attention → MoE)) = 60 layers

The VL variant (Qwen35VLMoEBridge) reuses the provider settings and LM mapping logic via the module-level helpers and static mapping methods.

.. rubric:: Example

from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“Qwen/Qwen3.5-397B-A17B”) model.save_pretrained(“./Qwen3.5-397B-A17B-LM”) tokenizer = AutoTokenizer.from_pretrained(“Qwen/Qwen3.5-397B-A17B”) tokenizer.save_pretrained(“./Qwen3.5-397B-A17B”) from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“./Qwen3.5-397B-A17B”) provider = bridge.to_megatron_provider()

static _get_moe_lm_mappings(hf_prefix='model.', megatron_prefix='')#

Get language model parameter mappings for MoE Qwen3.5.

Parameters:

hf_prefix – Prefix for HF param names in safetensors. Use “model.layers.” for LM and “model.language_model.layers.” for VL models.
megatron_prefix – Prefix for Megatron param names. Use “” for LM (default) and “language_model.” for VL models.

Returns:

List of mapping objects for the MoE LM portion.

static _get_moe_mtp_mappings( megatron_prefix: str = '', mtp_experts_packed: bool = False, )#

Get MTP parameter mappings for MoE Qwen3.5.

Parameters:

megatron_prefix – Prefix for Megatron param names. Use “” for LM and “language_model.” for VL models.
mtp_experts_packed – Whether the MTP experts are packed. Qwen3.5 stores per-expert (mtp.layers.0.mlp.experts.{i}.gate_proj.weight), whereas Qwen3.6 stores packed (mtp.layers.0.mlp.experts.gate_up_proj).

Returns:

List of mapping objects for the MoE MTP portion.

provider_bridge(hf_pretrained)#: Convert HuggingFace Qwen3.5 text model config to GPTModelProvider.

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for Qwen3.5 LM.

Combines:

Standard attention: QKV, output projection, QK layernorm
Linear attention (GDN): in_proj, out_proj, conv1d, A_log, dt_bias, out_norm
MoE: router, routed expert MLPs, shared expert MLPs, shared expert gate
Embeddings, output layer, final layernorm

Naming Convention:

Megatron language model params are prefixed with “decoder.”
HF language model params are prefixed with “model.layers.*”

Returns:: MegatronMappingRegistry with all parameter mappings

class bridge.models.qwen.qwen35_bridge.Qwen35Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen3.5 Dense Language Model.

The weight mappings handle:

Language model hybrid layers (GDN + standard attention)
Dense MLP with gated SiLU activation (fused pre-MLP layernorm)
QK layernorm, zero-centered RMSNorm for GDN output norm

Architecture (27B): 16 × (3 × GDN + 1 × Attention) = 64 layers

This class also serves as the base for Qwen35VLBridge (vision-language variant), which reuses the common provider settings and LM mapping logic via the static helper methods.

.. rubric:: Example

from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“Qwen/Qwen3.5-27B”) model.save_pretrained(“./Qwen3.5-27B-LM”) tokenizer = AutoTokenizer.from_pretrained(“Qwen/Qwen3.5-27B”) tokenizer.save_pretrained(“./Qwen3.5-27B-LM”) from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“./Qwen3.5-27B-LM”) provider = bridge.to_megatron_provider()

static _get_dense_lm_mappings(hf_prefix='model.', megatron_prefix='')#

Get language model parameter mappings for dense (non-MoE) Qwen3.5.

Parameters:

hf_prefix – Prefix for HF param names in safetensors. Use “model.layers.” for LM and “model.language_model.layers.” for VL models.
megatron_prefix – Prefix for Megatron param names. Use “” for LM (default) and “language_model.” for VL models.

Returns:

List of mapping objects for the dense LM portion.

static _get_dense_mtp_mappings(megatron_prefix='')#

Get MTP (Multi-Token Prediction) parameter mappings for dense Qwen3.5.

Parameters:: megatron_prefix – Prefix for Megatron param names. Use “” for LM and “language_model.” for VL models.
Returns:: List of mapping objects for the MTP portion.

provider_bridge(hf_pretrained)#: Convert HuggingFace Qwen3.5 text model config to GPTModelProvider.

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry for Qwen3.5 dense ML model.

Key differences from the MoE variant:

Dense MLP: gate_proj + up_proj fused into linear_fc1, down_proj as linear_fc2
Pre-MLP layernorm fused into mlp.linear_fc1 (not a separate pre_mlp_layernorm)
No MoE router, routed expert MLPs, or shared expert mappings

bridge.models.qwen.qwen35_bridge#

Module Contents#

Classes#

Functions#

API#

`bridge.models.qwen.qwen35_bridge`#