bridge.models.exaone.exaone4_bridge#

Megatron Bridge for EXAONE 4.0 (LG AI Research).

EXAONE 4.0 architecture overview:

  • Pure Post-LayerNorm (no Pre-LN / input_layernorm)

  • QK RMSNorm (similar to Qwen3)

  • GQA with 32 heads / 8 KV heads

  • SwiGLU activation

  • RoPE with llama3-style scaling

  • Tied word embeddings (embed_tokens == lm_head)

Key differences from standard Llama/Qwen:

  • No input_layernorm or pre_feedforward_layernorm weights

  • Has post_attention_layernorm (after self-attention output)

  • Has post_feedforward_layernorm (after MLP output, EXAONE-specific)

  • Post-LN mapping follows Gemma2 pattern: *.post_layernorm.weight

References:

  • HuggingFace: LGAI-EXAONE/EXAONE-4.0-1.2B

  • Gemma2 bridge: Post-LN via TERowParallelLinearLayerNorm pattern

  • Qwen3 bridge: QK layernorm mapping pattern

Module Contents#

Classes#

Exaone4Bridge

Megatron Bridge for EXAONE 4.0 Causal LM.

API#

class bridge.models.exaone.exaone4_bridge.Exaone4Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for EXAONE 4.0 Causal LM.

Supports bidirectional conversion between HuggingFace EXAONE 4.0 checkpoints and Megatron-Core GPTModel format.

Architecture notes:

  • EXAONE 4.0 uses pure Post-LayerNorm (no input_layernorm).

  • Post-LN is implemented via custom layer spec with TERowParallelLinearLayerNorm, following the same pattern established by Gemma2 bridge.

  • QK RMSNorm is mapped using the same convention as Qwen3.

  • 1.2B model uses full attention only (no sliding window / hybrid attention).

  • 32B model introduces hybrid attention (LLLG pattern) — future extension.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained( … “LGAI-EXAONE/EXAONE-4.0-1.2B”, … trust_remote_code=True, … ) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
) megatron.bridge.models.gpt_provider.GPTModelProvider#

Convert HuggingFace EXAONE 4.0 config to Megatron GPTModelProvider.

Maps HF config fields to Megatron TransformerConfig parameters and sets EXAONE-specific options including Post-LN, QK norm, and RoPE scaling.

Parameters:

hf_pretrained – HuggingFace PreTrainedCausalLM containing the EXAONE config

Returns:

GPTModelProvider configured for EXAONE 4.0 architecture

classmethod megatron_to_hf_config(
provider: megatron.bridge.models.gpt_provider.GPTModelProvider,
) dict#

Convert Megatron GPTModelProvider config to HuggingFace config dict.

Parameters:

provider – GPTModelProvider with EXAONE configuration

Returns:

Dictionary of HuggingFace Exaone4Config parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings.

EXAONE 4.0 weight mapping combines patterns from:

  • Llama: Basic GPT structure (embed, QKV, GatedMLP, final_layernorm)

  • Qwen3: QK layernorm (q_norm → q_layernorm, k_norm → k_layernorm)

  • Gemma2: Post-LN (post_*_layernorm → *.post_layernorm.weight)

Key difference: No input_layernorm or pre_feedforward_layernorm mappings because EXAONE uses pure Post-LN (not Pre-LN or sandwich norm).