bridge.models.qwen.qwen35_bridge#
Module Contents#
Classes#
Megatron Bridge for Qwen3.5 Language Model (MoE variant). |
|
Megatron Bridge for Qwen3.5 Dense Language Model. |
Functions#
Apply Qwen3.5 common LM configuration to a Megatron provider. |
|
Apply Qwen3.5 MoE-specific configuration to a Megatron provider. |
API#
- bridge.models.qwen.qwen35_bridge._apply_qwen35_common_config(
- provider: megatron.bridge.models.gpt_provider.GPTModelProvider,
- text_config,
Apply Qwen3.5 common LM configuration to a Megatron provider.
Covers settings shared by both dense and MoE variants: normalization, GDN hybrid architecture, and MTP.
- Parameters:
provider – GPTModelProvider (or subclass) to configure.
text_config – HuggingFace config object (or text_config for VLMs) so that language-model fields are read from the correct level.
- bridge.models.qwen.qwen35_bridge._apply_qwen35_moe_config(
- provider: megatron.bridge.models.gpt_provider.GPTModelProvider,
- text_config,
Apply Qwen3.5 MoE-specific configuration to a Megatron provider.
Calls _apply_qwen35_common_config first, then adds MoE parameters.
- Parameters:
provider – GPTModelProvider (or subclass) to configure.
text_config – HuggingFace config object (or text_config for VLMs) so that language-model fields are read from the correct level.
- class bridge.models.qwen.qwen35_bridge.Qwen35MoEBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Qwen3.5 Language Model (MoE variant).
This bridge handles the conversion between HuggingFace Qwen3.5 language model and Megatron-Core Qwen3.5 Model formats, including weight mappings and configuration translation for the hybrid GDN+Attention LM architecture.
The weight mappings handle:
Language model hybrid layers (GDN + standard attention)
MoE layers with routed and shared experts
QK layernorm, zero-centered RMSNorm for GDN output norm
Architecture: 15 × (3 × (GDN → MoE) + 1 × (Attention → MoE)) = 60 layers
The VL variant (Qwen35VLMoEBridge) reuses the provider settings and LM mapping logic via the module-level helpers and static mapping methods.
.. rubric:: Example
from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“Qwen/Qwen3.5-397B-A17B”) model.save_pretrained(“./Qwen3.5-397B-A17B-LM”) tokenizer = AutoTokenizer.from_pretrained(“Qwen/Qwen3.5-397B-A17B”) tokenizer.save_pretrained(“./Qwen3.5-397B-A17B”) from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“./Qwen3.5-397B-A17B”) provider = bridge.to_megatron_provider()
- static _get_moe_lm_mappings(hf_prefix='model.', megatron_prefix='')#
Get language model parameter mappings for MoE Qwen3.5.
- Parameters:
hf_prefix – Prefix for HF param names in safetensors. Use “model.layers.” for LM and “model.language_model.layers.” for VL models.
megatron_prefix – Prefix for Megatron param names. Use “” for LM (default) and “language_model.” for VL models.
- Returns:
List of mapping objects for the MoE LM portion.
- static _get_moe_mtp_mappings(
- megatron_prefix: str = '',
- mtp_experts_packed: bool = False,
Get MTP parameter mappings for MoE Qwen3.5.
- Parameters:
megatron_prefix – Prefix for Megatron param names. Use “” for LM and “language_model.” for VL models.
mtp_experts_packed – Whether the MTP experts are packed. Qwen3.5 stores per-expert (mtp.layers.0.mlp.experts.{i}.gate_proj.weight), whereas Qwen3.6 stores packed (mtp.layers.0.mlp.experts.gate_up_proj).
- Returns:
List of mapping objects for the MoE MTP portion.
- provider_bridge(hf_pretrained)#
Convert HuggingFace Qwen3.5 text model config to GPTModelProvider.
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings for Qwen3.5 LM.
Combines:
Standard attention: QKV, output projection, QK layernorm
Linear attention (GDN): in_proj, out_proj, conv1d, A_log, dt_bias, out_norm
MoE: router, routed expert MLPs, shared expert MLPs, shared expert gate
Embeddings, output layer, final layernorm
Naming Convention:
Megatron language model params are prefixed with “decoder.”
HF language model params are prefixed with “model.layers.*”
- Returns:
MegatronMappingRegistry with all parameter mappings
- class bridge.models.qwen.qwen35_bridge.Qwen35Bridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Qwen3.5 Dense Language Model.
This bridge handles the conversion between HuggingFace Qwen3.5 language model and Megatron-Core Qwen3.5 Model formats, including weight mappings and configuration translation for the hybrid GDN+Attention LM architecture.
The weight mappings handle:
Language model hybrid layers (GDN + standard attention)
Dense MLP with gated SiLU activation (fused pre-MLP layernorm)
QK layernorm, zero-centered RMSNorm for GDN output norm
Architecture (27B): 16 Ă— (3 Ă— GDN + 1 Ă— Attention) = 64 layers
This class also serves as the base for Qwen35VLBridge (vision-language variant), which reuses the common provider settings and LM mapping logic via the static helper methods.
.. rubric:: Example
from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“Qwen/Qwen3.5-27B”) model.save_pretrained(“./Qwen3.5-27B-LM”) tokenizer = AutoTokenizer.from_pretrained(“Qwen/Qwen3.5-27B”) tokenizer.save_pretrained(“./Qwen3.5-27B-LM”) from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“./Qwen3.5-27B-LM”) provider = bridge.to_megatron_provider()
- static _get_dense_lm_mappings(hf_prefix='model.', megatron_prefix='')#
Get language model parameter mappings for dense (non-MoE) Qwen3.5.
- Parameters:
hf_prefix – Prefix for HF param names in safetensors. Use “model.layers.” for LM and “model.language_model.layers.” for VL models.
megatron_prefix – Prefix for Megatron param names. Use “” for LM (default) and “language_model.” for VL models.
- Returns:
List of mapping objects for the dense LM portion.
- static _get_dense_mtp_mappings(megatron_prefix='')#
Get MTP (Multi-Token Prediction) parameter mappings for dense Qwen3.5.
- Parameters:
megatron_prefix – Prefix for Megatron param names. Use “” for LM and “language_model.” for VL models.
- Returns:
List of mapping objects for the MTP portion.
- provider_bridge(hf_pretrained)#
Convert HuggingFace Qwen3.5 text model config to GPTModelProvider.
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry for Qwen3.5 dense ML model.
Key differences from the MoE variant:
Dense MLP: gate_proj + up_proj fused into linear_fc1, down_proj as linear_fc2
Pre-MLP layernorm fused into mlp.linear_fc1 (not a separate pre_mlp_layernorm)
No MoE router, routed expert MLPs, or shared expert mappings