`bridge.models.exaone.exaone4_provider`#

Model provider and custom layer specifications for EXAONE 4.0.

EXAONE 4.0 uses a pure Post-LayerNorm architecture: h = x + Attn(x) # no pre-norm before attention h = PostAttnNorm(h) # RMSNorm after residual add o = h + MLP(h) # no pre-norm before MLP o = PostFFNNorm(o) # RMSNorm after residual add

This requires a custom layer spec because the standard Megatron GPT spec assumes Pre-LN (fusing layernorm into the column-parallel linear via TELayerNormColumnParallelLinear). EXAONE instead needs:

Plain column-parallel linears for QKV and FC1 (no fused pre-norm)
Row-parallel linears with post-layernorm for output projection and FC2

The Post-LN implementation reuses the TERowParallelLinearLayerNorm pattern established by Gemma2 bridge.

Module Contents#

Functions#

exaone4_layer_spec

EXAONE 4.0 layer specification with pure Post-LayerNorm.

API#

bridge.models.exaone.exaone4_provider.exaone4_layer_spec( config: megatron.bridge.models.gpt_provider.GPTModelProvider, ) → megatron.core.transformer.ModuleSpec#

EXAONE 4.0 layer specification with pure Post-LayerNorm.

Key differences from standard GPT layer spec:

linear_qkv: TEColumnParallelLinear (no fused pre-norm, since no input_layernorm)
linear_proj: TERowParallelLinearLayerNorm (post-attention norm)
linear_fc1: TEColumnParallelLinear (no fused pre-norm, since no pre_feedforward_layernorm)
linear_fc2: TERowParallelLinearLayerNorm (post-feedforward norm)
QK layernorm is handled by qk_layernorm=True in TransformerConfig

Parameters:: config – Reserved for future use (e.g., 32B hybrid attention with layer-wise branching between local and global attention).
Returns:: ModuleSpec for EXAONE 4.0 transformer layer

bridge.models.exaone.exaone4_provider#

Module Contents#

Functions#

API#

`bridge.models.exaone.exaone4_provider`#