bridge.models.ernie_vl.modeling_ernie45_vl.ernie_decoder_layer_spec#
Decoder layer spec for ERNIE 4.5 VL MoE.
Creates heterogeneous transformer block specs where:
Layer 0: dense MLP
Layers 1+: ErnieMultiTypeMoE (dual-pool MoE with text + vision expert pools)
The text and vision MoE pools each use standard Megatron MoELayer with SequentialMLP experts, enabling full TP/EP compatibility through standard Megatron-Core infrastructure.
Module Contents#
Functions#
Get appropriate linear module classes based on TE availability. |
|
Get MLP module spec for dense or dual-pool MoE layers. |
|
Get a single transformer layer spec. |
|
Get the full decoder block spec for ERNIE 4.5 VL MoE. |
API#
- bridge.models.ernie_vl.modeling_ernie45_vl.ernie_decoder_layer_spec._get_linear_modules()#
Get appropriate linear module classes based on TE availability.
- bridge.models.ernie_vl.modeling_ernie45_vl.ernie_decoder_layer_spec._get_mlp_module_spec(
- num_experts: Optional[int] = None,
- moe_grouped_gemm: bool = False,
Get MLP module spec for dense or dual-pool MoE layers.
- Parameters:
num_experts – Number of experts per pool. None for dense MLP.
moe_grouped_gemm – Whether to use grouped GEMM for experts.
- Returns:
ModuleSpec for either dense MLP or ErnieMultiTypeMoE.
- bridge.models.ernie_vl.modeling_ernie45_vl.ernie_decoder_layer_spec._get_ernie_decoder_layer_spec(
- num_experts: Optional[int] = None,
- moe_grouped_gemm: bool = False,
Get a single transformer layer spec.
- Parameters:
num_experts – Number of experts per pool. None for dense layer.
moe_grouped_gemm – Whether to use grouped GEMM.
- Returns:
ModuleSpec for a TransformerLayer.
- bridge.models.ernie_vl.modeling_ernie45_vl.ernie_decoder_layer_spec.get_ernie45_vl_decoder_block_spec(
- config,
- use_transformer_engine: bool = True,
Get the full decoder block spec for ERNIE 4.5 VL MoE.
Creates a heterogeneous block where layer types are determined by config.moe_layer_freq (list of 0/1 per layer):
0: dense MLP layer
1: ErnieMultiTypeMoE layer (dual-pool MoE)
- Parameters:
config – TransformerConfig with moe_layer_freq, num_moe_experts, etc.
use_transformer_engine – Whether to use TE modules.
- Returns:
TransformerBlockSubmodules with heterogeneous layer specs.