`bridge.models.gemma.gemma_provider`#

Module Contents#

Classes#

GemmaModelProvider

Configuration class for Gemma models.

API#

class bridge.models.gemma.gemma_provider.GemmaModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Configuration class for Gemma models.

normalization: str#: ‘RMSNorm’

activation_func: Callable#: None

gated_linear_unit: bool#: True

position_embedding_type: str#: ‘rope’

add_bias_linear: bool#: False

seq_length: int#: 8192

kv_channels: int#: 256

attention_dropout: float#: 0.0

hidden_dropout: float#: 0.0

share_embeddings_and_output_weights: bool#: True

layernorm_zero_centered_gamma: bool#: True

attention_backend: megatron.core.transformer.enums.AttnBackend#: None

layernorm_epsilon: float#: 1e-06

vocab_size: int#: 256000

bf16: bool#: True

params_dtype: torch.dtype#: None

autocast_dtype: torch.dtype#: None

provide( pre_process=None, post_process=None, vp_stage=None, ) → megatron.core.models.gpt.GPTModel#

Configure and instantiate a Megatron Core Gemma model.

Extends the base configuration with Gemma-specific embedding scaling.

Parameters:

pre_process – Whether to include pre-processing in the model
post_process – Whether to include post-processing in the model
vp_stage – Virtual pipeline stage
tokenizer – Tokenizer used with the model

Returns:

Configured Megatron Core GPT model instance

Return type:

MCoreGPTModel