bridge.models.mla_provider#
MLA (Multi-Latent Attention) Model Provider.
This module provides a minimal provider for models using Multi-Latent Attention, such as DeepSeek V2/V3 and Kimi K2.
Module Contents#
Classes#
Provider for models using Multi-Latent Attention (MLA). |
API#
- class bridge.models.mla_provider.MLAModelProvider#
Bases:
megatron.bridge.models.transformer_config.MLATransformerConfig,megatron.bridge.models.gpt_provider.GPTModelProviderProvider for models using Multi-Latent Attention (MLA).
This class combines MLATransformerConfig (which provides MLA-specific fields like q_lora_rank, kv_lora_rank, qk_head_dim, v_head_dim) with GPTModelProvider (which provides the model instantiation logic).
Model-specific defaults (normalization, activation, fusions, etc.) should be configured via MEGATRON_DEFAULTS in the respective bridge classes.
Used by: - DeepSeek V2/V3 - Kimi K2 - Other MLA-based models