bridge.models.ernie_vl.ernie45_vl_provider#
Provider for ERNIE 4.5 VL MoE model.
Maps HuggingFace Ernie4_5_VLMoeConfig to Megatron-Core TransformerConfig and provides model instantiation logic for the dual-pool MoE architecture.
The language model uses a custom ErnieMultiTypeMoE layer containing both text_moe_layer and vision_moe_layer as separate MoELayer instances, each with their own router, experts, and EP support.
Module Contents#
Classes#
Model provider for ERNIE 4.5 VL MoE. |
API#
- class bridge.models.ernie_vl.ernie45_vl_provider.Ernie45VLModelProvider#
Bases:
megatron.bridge.models.gpt_provider.GPTModelProviderModel provider for ERNIE 4.5 VL MoE.
This provider extends GPTModelProvider with ERNIE 4.5 VL-specific fields:
Vision configuration for the ViT encoder and resampler
3D M-RoPE parameters (mrope_section)
Dual-pool MoE configuration (moe_intermediate_size as tuple)
Custom decoder layer spec with ErnieMultiTypeMoE
Token IDs for image/video placeholder tokens
Freeze options for vision/language components
- scatter_embedding_sequence_parallel: bool#
False
- position_embedding_type: str#
‘mrope’
- mrope_section: List[int]#
‘field(…)’
- vision_config: Any#
‘field(…)’
- hf_config: Any#
None
- moe_intermediate_size: Tuple[int, int]#
(1536, 512)
- image_start_token_id: int#
101304
- image_end_token_id: int#
101305
- image_token_id: int#
100295
- video_start_token_id: int#
101306
- video_end_token_id: int#
101307
- video_token_id: int#
103367
- freeze_language_model: bool#
False
- freeze_vision_model: bool#
False
- freeze_vision_projection: bool#
False
- use_mg_vit: bool#
False
- transformer_layer_spec: Union[megatron.core.transformer.spec_utils.ModuleSpec, Callable[[megatron.bridge.models.gpt_provider.GPTModelProvider], megatron.core.transformer.spec_utils.ModuleSpec]]#
None
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Build the composite VLM model (vision + resampler + language model).
- Parameters:
pre_process – Whether to include pre-processing (embedding + vision). Defaults to first PP stage.
post_process – Whether to include post-processing (output layer). Defaults to last PP stage.
vp_stage – Virtual pipeline stage index.
- Returns:
Configured ERNIE 4.5 VL MoE model instance.
- Return type:
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Build only the language model (MCoreGPTModel) for weight conversion.
This uses GPTModelProvider.provide() which builds a standard MCoreGPTModel but with the custom ErnieMultiTypeMoE layer spec set via transformer_layer_spec. The resulting model has both text_moe_layer and vision_moe_layer as proper submodules of each MoE transformer layer.
- Parameters:
pre_process – Whether to include pre-processing.
post_process – Whether to include post-processing.
vp_stage – Virtual pipeline stage index.
- Returns:
Configured Megatron-Core GPT model instance with dual-pool MoE.
- Return type:
MCoreGPTModel