bridge.models.exaone.exaone4_provider#

Model provider and custom layer specifications for EXAONE 4.0.

EXAONE 4.0 uses a pure Post-LayerNorm architecture: h = x + Attn(x) # no pre-norm before attention h = PostAttnNorm(h) # RMSNorm after residual add o = h + MLP(h) # no pre-norm before MLP o = PostFFNNorm(o) # RMSNorm after residual add

This requires a custom layer spec because the standard Megatron GPT spec assumes Pre-LN (fusing layernorm into the column-parallel linear via TELayerNormColumnParallelLinear). EXAONE instead needs:

  • Plain column-parallel linears for QKV and FC1 (no fused pre-norm)

  • Row-parallel linears with post-layernorm for output projection and FC2

The Post-LN implementation reuses the TERowParallelLinearLayerNorm pattern established by Gemma2 bridge.

Module Contents#

Functions#

exaone4_layer_spec

EXAONE 4.0 layer specification with pure Post-LayerNorm.

API#

bridge.models.exaone.exaone4_provider.exaone4_layer_spec(
config: megatron.bridge.models.gpt_provider.GPTModelProvider,
) megatron.core.transformer.ModuleSpec#

EXAONE 4.0 layer specification with pure Post-LayerNorm.

Key differences from standard GPT layer spec:

  • linear_qkv: TEColumnParallelLinear (no fused pre-norm, since no input_layernorm)

  • linear_proj: TERowParallelLinearLayerNorm (post-attention norm)

  • linear_fc1: TEColumnParallelLinear (no fused pre-norm, since no pre_feedforward_layernorm)

  • linear_fc2: TERowParallelLinearLayerNorm (post-feedforward norm)

  • QK layernorm is handled by qk_layernorm=True in TransformerConfig

Parameters:

config – Reserved for future use (e.g., 32B hybrid attention with layer-wise branching between local and global attention).

Returns:

ModuleSpec for EXAONE 4.0 transformer layer