bridge.models.exaone.exaone4_bridge#
Megatron Bridge for EXAONE 4.0 (LG AI Research).
EXAONE 4.0 architecture overview:
Pure Post-LayerNorm (no Pre-LN / input_layernorm)
QK RMSNorm (similar to Qwen3)
GQA with 32 heads / 8 KV heads
SwiGLU activation
RoPE with llama3-style scaling
Tied word embeddings (embed_tokens == lm_head)
Key differences from standard Llama/Qwen:
No input_layernorm or pre_feedforward_layernorm weights
Has post_attention_layernorm (after self-attention output)
Has post_feedforward_layernorm (after MLP output, EXAONE-specific)
Post-LN mapping follows Gemma2 pattern: *.post_layernorm.weight
References:
HuggingFace: LGAI-EXAONE/EXAONE-4.0-1.2B
Gemma2 bridge: Post-LN via TERowParallelLinearLayerNorm pattern
Qwen3 bridge: QK layernorm mapping pattern
Module Contents#
Classes#
Megatron Bridge for EXAONE 4.0 Causal LM. |
API#
- class bridge.models.exaone.exaone4_bridge.Exaone4Bridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for EXAONE 4.0 Causal LM.
Supports bidirectional conversion between HuggingFace EXAONE 4.0 checkpoints and Megatron-Core GPTModel format.
Architecture notes:
EXAONE 4.0 uses pure Post-LayerNorm (no input_layernorm).
Post-LN is implemented via custom layer spec with TERowParallelLinearLayerNorm, following the same pattern established by Gemma2 bridge.
QK RMSNorm is mapped using the same convention as Qwen3.
1.2B model uses full attention only (no sliding window / hybrid attention).
32B model introduces hybrid attention (LLLG pattern) — future extension.
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained( … “LGAI-EXAONE/EXAONE-4.0-1.2B”, … trust_remote_code=True, … ) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
Convert HuggingFace EXAONE 4.0 config to Megatron GPTModelProvider.
Maps HF config fields to Megatron TransformerConfig parameters and sets EXAONE-specific options including Post-LN, QK norm, and RoPE scaling.
- Parameters:
hf_pretrained – HuggingFace PreTrainedCausalLM containing the EXAONE config
- Returns:
GPTModelProvider configured for EXAONE 4.0 architecture
- classmethod megatron_to_hf_config(
- provider: megatron.bridge.models.gpt_provider.GPTModelProvider,
Convert Megatron GPTModelProvider config to HuggingFace config dict.
- Parameters:
provider – GPTModelProvider with EXAONE configuration
- Returns:
Dictionary of HuggingFace Exaone4Config parameters
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings.
EXAONE 4.0 weight mapping combines patterns from:
Llama: Basic GPT structure (embed, QKV, GatedMLP, final_layernorm)
Qwen3: QK layernorm (q_norm → q_layernorm, k_norm → k_layernorm)
Gemma2: Post-LN (post_*_layernorm → *.post_layernorm.weight)
Key difference: No input_layernorm or pre_feedforward_layernorm mappings because EXAONE uses pure Post-LN (not Pre-LN or sandwich norm).