`bridge.models.ernie.ernie_45_bridge`#

Megatron Bridge for ERNIE 4.5 text-only MoE model.

Maps HuggingFace Ernie4_5_MoeForCausalLM weights and config to Megatron-Core GPTModel with single-pool MoE (64 experts, top-6 routing, shared experts, expert bias for aux-free load balancing).

Module Contents#

Classes#

`_PPSafeMixin`	Mixin that makes `megatron_to_hf` safe for PP export of MoE-only params.
`_PPSafeAutoMapping`	AutoMapping that skips export for missing parameters.
`_PPSafeReplicatedMapping`	ReplicatedMapping that skips export for missing parameters.
`_PPSafeGatedMLPMapping`	GatedMLPMapping that skips export for missing parameters.
`_SqueezeBiasMapping`	Mapping for the single-pool expert bias tensor.
`Ernie45Bridge`	Megatron Bridge for ERNIE 4.5 text-only MoE Causal LM.

Functions#

_ernie45_decoder_block_spec

Create a decoder block spec that respects moe_layer_freq.

Data#

_ERNIE45_MOE_HF_CLASS_NAME

API#

bridge.models.ernie.ernie_45_bridge._ernie45_decoder_block_spec( config: megatron.bridge.models.gpt_provider.GPTModelProvider, vp_stage: int | None = None, )#

Create a decoder block spec that respects moe_layer_freq.

The default GPTModelProvider.transformer_layer_spec calls get_gpt_layer_with_transformer_engine_spec which returns a single MoE layer spec applied uniformly to ALL layers, ignoring moe_layer_freq.

ERNIE 4.5 has mixed dense/MoE layers (layer 0 is dense, layers 1-N are MoE). This function uses get_gpt_decoder_block_spec which calls get_gpt_decoder_layer_specs — the code path that parses config.moe_layer_freq and creates per-layer specs (dense for pattern=0, MoE for pattern=1).

bridge.models.ernie.ernie_45_bridge._ERNIE45_MOE_HF_CLASS_NAME#: ‘Ernie4_5_MoeForCausalLM’

class bridge.models.ernie.ernie_45_bridge._PPSafeMixin#

Mixin that makes megatron_to_hf safe for PP export of MoE-only params.

When moe_layer_freq makes some layers dense and others MoE, MoE-only parameters (router weight, expert bias, shared/routed expert weights) do not exist on dense layers. With PP > 1, broadcast_from_pp_rank raises ValueError because no PP rank owns the tensor.

This mixin catches that error and returns {} so the conversion loop simply omits the parameter from the output.

Must be listed before the base mapping class in the MRO so that super().megatron_to_hf resolves to the concrete mapping’s method.

megatron_to_hf(megatron_weights, megatron_module)#

class bridge.models.ernie.ernie_45_bridge._PPSafeAutoMapping#

Bases: bridge.models.ernie.ernie_45_bridge._PPSafeMixin, megatron.bridge.models.conversion.param_mapping.AutoMapping

AutoMapping that skips export for missing parameters.

class bridge.models.ernie.ernie_45_bridge._PPSafeReplicatedMapping#

Bases: bridge.models.ernie.ernie_45_bridge._PPSafeMixin, megatron.bridge.models.conversion.param_mapping.ReplicatedMapping

ReplicatedMapping that skips export for missing parameters.

class bridge.models.ernie.ernie_45_bridge._PPSafeGatedMLPMapping#

Bases: bridge.models.ernie.ernie_45_bridge._PPSafeMixin, megatron.bridge.models.conversion.param_mapping.GatedMLPMapping

GatedMLPMapping that skips export for missing parameters.

class bridge.models.ernie.ernie_45_bridge._SqueezeBiasMapping#

Bases: bridge.models.ernie.ernie_45_bridge._PPSafeReplicatedMapping

Mapping for the single-pool expert bias tensor.

The HF text-only model stores moe_statics.e_score_correction_bias with shape [1, num_experts] (1 expert group for text-only). Megatron stores router.expert_bias as a flat [num_experts] tensor.

This mapping squeezes dim-0 on import and unsqueezes on export.

Inherits from _PPSafeReplicatedMapping to gracefully skip dense layers during PP export.

hf_to_megatron(hf_weights, megatron_module)#

megatron_to_hf(megatron_weights, megatron_module)#

class bridge.models.ernie.ernie_45_bridge.Ernie45Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for ERNIE 4.5 text-only MoE Causal LM.

This bridge handles the conversion between HuggingFace Ernie4_5_MoeForCausalLM and Megatron-Core GPTModel formats with single-pool MoE architecture.

Key architectural features:

Single-pool MoE: 64 experts, top-6 routing, shared experts
Softmax routing with expert bias for aux-free load balancing
Interleaved RoPE (base=500000)
GQA with 20 query heads, 4 KV heads, kv_channels=128
RMSNorm, SiLU-gated MLP
Router gate weight stored as [H, E] in HF (transposed for Megatron [E, H])

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“baidu/ERNIE-4.5-0.3B-PT”) provider = bridge.to_megatron_provider()

static _get_num_experts(hf_config) → int#

Extract num_experts as an int.

The config may store moe_num_experts as a plain int or as a list [N] (single pool) or [N, N] (dual pool – take first).

provider_bridge(hf_pretrained)#

Convert HuggingFace ERNIE 4.5 MoE config to GPTModelProvider.

Uses super().provider_bridge() for standard CONFIG_MAPPING fields (hidden_size, num_layers, rope_theta, tie_word_embeddings, etc.) and then overrides ERNIE-specific settings.

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#: Return MegatronMappingRegistry with parameter mappings for ERNIE 4.5 MoE.

bridge.models.ernie.ernie_45_bridge#