bridge.models.gemma.gemma_provider#
Module Contents#
Classes#
Configuration class for Gemma models. |
API#
- class bridge.models.gemma.gemma_provider.GemmaModelProvider#
Bases:
megatron.bridge.models.gpt_provider.GPTModelProviderConfiguration class for Gemma models.
- normalization: str#
‘RMSNorm’
- activation_func: Callable#
None
- gated_linear_unit: bool#
True
- position_embedding_type: str#
‘rope’
- add_bias_linear: bool#
False
- seq_length: int#
8192
- kv_channels: int#
256
- attention_dropout: float#
0.0
0.0
True
- layernorm_zero_centered_gamma: bool#
True
- attention_backend: megatron.core.transformer.enums.AttnBackend#
None
- layernorm_epsilon: float#
1e-06
- vocab_size: int#
256000
- bf16: bool#
True
- params_dtype: torch.dtype#
None
- autocast_dtype: torch.dtype#
None
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Configure and instantiate a Megatron Core Gemma model.
Extends the base configuration with Gemma-specific embedding scaling.
- Parameters:
pre_process – Whether to include pre-processing in the model
post_process – Whether to include post-processing in the model
vp_stage – Virtual pipeline stage
tokenizer – Tokenizer used with the model
- Returns:
Configured Megatron Core GPT model instance
- Return type:
MCoreGPTModel