bridge.models.ernie.ernie_45_bridge#
Megatron Bridge for ERNIE 4.5 text-only MoE model.
Maps HuggingFace Ernie4_5_MoeForCausalLM weights and config to Megatron-Core GPTModel with single-pool MoE (64 experts, top-6 routing, shared experts, expert bias for aux-free load balancing).
Module Contents#
Classes#
Mixin that makes |
|
AutoMapping that skips export for missing parameters. |
|
ReplicatedMapping that skips export for missing parameters. |
|
GatedMLPMapping that skips export for missing parameters. |
|
Mapping for the single-pool expert bias tensor. |
|
Megatron Bridge for ERNIE 4.5 text-only MoE Causal LM. |
Functions#
Create a decoder block spec that respects |
Data#
API#
- bridge.models.ernie.ernie_45_bridge._ernie45_decoder_block_spec(
- config: megatron.bridge.models.gpt_provider.GPTModelProvider,
- vp_stage: int | None = None,
Create a decoder block spec that respects
moe_layer_freq.The default
GPTModelProvider.transformer_layer_speccallsget_gpt_layer_with_transformer_engine_specwhich returns a single MoE layer spec applied uniformly to ALL layers, ignoringmoe_layer_freq.ERNIE 4.5 has mixed dense/MoE layers (layer 0 is dense, layers 1-N are MoE). This function uses
get_gpt_decoder_block_specwhich callsget_gpt_decoder_layer_specsâ the code path that parsesconfig.moe_layer_freqand creates per-layer specs (dense for pattern=0, MoE for pattern=1).
- bridge.models.ernie.ernie_45_bridge._ERNIE45_MOE_HF_CLASS_NAME#
âErnie4_5_MoeForCausalLMâ
- class bridge.models.ernie.ernie_45_bridge._PPSafeMixin#
Mixin that makes
megatron_to_hfsafe for PP export of MoE-only params.When
moe_layer_freqmakes some layers dense and others MoE, MoE-only parameters (router weight, expert bias, shared/routed expert weights) do not exist on dense layers. With PP > 1,broadcast_from_pp_rankraisesValueErrorbecause no PP rank owns the tensor.This mixin catches that error and returns
{}so the conversion loop simply omits the parameter from the output.Must be listed before the base mapping class in the MRO so that
super().megatron_to_hfresolves to the concrete mappingâs method.- megatron_to_hf(megatron_weights, megatron_module)#
- class bridge.models.ernie.ernie_45_bridge._PPSafeAutoMapping#
Bases:
bridge.models.ernie.ernie_45_bridge._PPSafeMixin,megatron.bridge.models.conversion.param_mapping.AutoMappingAutoMapping that skips export for missing parameters.
- class bridge.models.ernie.ernie_45_bridge._PPSafeReplicatedMapping#
Bases:
bridge.models.ernie.ernie_45_bridge._PPSafeMixin,megatron.bridge.models.conversion.param_mapping.ReplicatedMappingReplicatedMapping that skips export for missing parameters.
- class bridge.models.ernie.ernie_45_bridge._PPSafeGatedMLPMapping#
Bases:
bridge.models.ernie.ernie_45_bridge._PPSafeMixin,megatron.bridge.models.conversion.param_mapping.GatedMLPMappingGatedMLPMapping that skips export for missing parameters.
- class bridge.models.ernie.ernie_45_bridge._SqueezeBiasMapping#
Bases:
bridge.models.ernie.ernie_45_bridge._PPSafeReplicatedMappingMapping for the single-pool expert bias tensor.
The HF text-only model stores
moe_statics.e_score_correction_biaswith shape[1, num_experts](1 expert group for text-only). Megatron storesrouter.expert_biasas a flat[num_experts]tensor.This mapping squeezes dim-0 on import and unsqueezes on export.
Inherits from
_PPSafeReplicatedMappingto gracefully skip dense layers during PP export.- hf_to_megatron(hf_weights, megatron_module)#
- megatron_to_hf(megatron_weights, megatron_module)#
- class bridge.models.ernie.ernie_45_bridge.Ernie45Bridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for ERNIE 4.5 text-only MoE Causal LM.
This bridge handles the conversion between HuggingFace Ernie4_5_MoeForCausalLM and Megatron-Core GPTModel formats with single-pool MoE architecture.
Key architectural features:
Single-pool MoE: 64 experts, top-6 routing, shared experts
Softmax routing with expert bias for aux-free load balancing
Interleaved RoPE (base=500000)
GQA with 20 query heads, 4 KV heads, kv_channels=128
RMSNorm, SiLU-gated MLP
Router gate weight stored as [H, E] in HF (transposed for Megatron [E, H])
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(âbaidu/ERNIE-4.5-0.3B-PTâ) provider = bridge.to_megatron_provider()
- static _get_num_experts(hf_config) int#
Extract num_experts as an int.
The config may store moe_num_experts as a plain int or as a list
[N](single pool) or[N, N](dual pool â take first).
- provider_bridge(hf_pretrained)#
Convert HuggingFace ERNIE 4.5 MoE config to GPTModelProvider.
Uses super().provider_bridge() for standard CONFIG_MAPPING fields (hidden_size, num_layers, rope_theta, tie_word_embeddings, etc.) and then overrides ERNIE-specific settings.
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry with parameter mappings for ERNIE 4.5 MoE.