bridge.models.gemma.gemma_provider#

Module Contents#

Classes#

GemmaModelProvider

Configuration class for Gemma models.

API#

class bridge.models.gemma.gemma_provider.GemmaModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Configuration class for Gemma models.

normalization: str#

‘RMSNorm’

activation_func: Callable#

None

gated_linear_unit: bool#

True

position_embedding_type: str#

‘rope’

add_bias_linear: bool#

False

seq_length: int#

8192

kv_channels: int#

256

attention_dropout: float#

0.0

hidden_dropout: float#

0.0

share_embeddings_and_output_weights: bool#

True

layernorm_zero_centered_gamma: bool#

True

attention_backend: megatron.core.transformer.enums.AttnBackend#

None

layernorm_epsilon: float#

1e-06

vocab_size: int#

256000

bf16: bool#

True

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

provide(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.gpt.GPTModel#

Configure and instantiate a Megatron Core Gemma model.

Extends the base configuration with Gemma-specific embedding scaling.

Parameters:
  • pre_process – Whether to include pre-processing in the model

  • post_process – Whether to include post-processing in the model

  • vp_stage – Virtual pipeline stage

  • tokenizer – Tokenizer used with the model

Returns:

Configured Megatron Core GPT model instance

Return type:

MCoreGPTModel