bridge.models.ernie_vl.modeling_ernie45_vl.vision_layer_spec#
Layer spec for the ERNIE 4.5 VL Megatron-native Vision Transformer (ViT).
Provides get_ernie_vit_layer_spec() which returns a ModuleSpec for a
single ViT transformer layer using Transformer Engine modules.
The spec is identical to the standard MCore ViT spec from
megatron.core.models.vision.vit_layer_specs.get_vit_layer_with_transformer_engine_spec
except that self_attention.module is overridden with ErnieVLSelfAttention
to handle absolute 2D RoPE (non-interleaved rotate_half style).
Architecture details: - Attention: TELayerNormColumnParallelLinear (fused QKV + LN) + TEDotProductAttention + TERowParallelLinear - MLP: TELayerNormColumnParallelLinear (fused fc1 + LN) + TERowParallelLinear - Mask type: AttnMaskType.no_mask (bidirectional attention for ViT) - pre_mlp_layernorm: IdentityOp (LN is fused into TE linear layers)
Module Contents#
Functions#
Return a TransformerLayer ModuleSpec for ERNIE ViT. |
API#
- bridge.models.ernie_vl.modeling_ernie45_vl.vision_layer_spec.get_ernie_vit_layer_spec()#
Return a TransformerLayer ModuleSpec for ERNIE ViT.
This reuses the standard MCore ViT TE spec and only overrides the self-attention module with
ErnieVLSelfAttentionto apply absolute 2D RoPE embeddings instead of the standard relative RoPE.- Returns:
Spec for one ERNIE ViT transformer layer.
- Return type:
ModuleSpec