bridge.models.gemma.gemma_provider#

Module Contents#

Classes#

GemmaModelProvider

Configuration class for Gemma models.

GemmaModelProvider2B

Configuration for a 2B parameter Gemma model.

GemmaModelProvider7B

Configuration for a 7B parameter Gemma model.

CodeGemmaModelProvider2B

Configuration for a 2B parameter Code Gemma model.

CodeGemmaModelProvider7B

Configuration for a 7B parameter Code Gemma model.

API#

class bridge.models.gemma.gemma_provider.GemmaModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Configuration class for Gemma models.

normalization: str#

‘RMSNorm’

activation_func: Callable#

None

gated_linear_unit: bool#

True

position_embedding_type: str#

‘rope’

add_bias_linear: bool#

False

seq_length: int#

8192

kv_channels: int#

256

attention_dropout: float#

0.0

hidden_dropout: float#

0.0

share_embeddings_and_output_weights: bool#

True

layernorm_zero_centered_gamma: bool#

True

attention_backend: megatron.core.transformer.enums.AttnBackend#

None

layernorm_epsilon: float#

1e-06

vocab_size: int#

256000

bf16: bool#

True

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

provide(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.gpt.GPTModel#

Configure and instantiate a Megatron Core Gemma model.

Extends the base configuration with Gemma-specific embedding scaling.

Parameters:
  • pre_process – Whether to include pre-processing in the model

  • post_process – Whether to include post-processing in the model

  • vp_stage – Virtual pipeline stage

  • tokenizer – Tokenizer used with the model

Returns:

Configured Megatron Core GPT model instance

Return type:

MCoreGPTModel

class bridge.models.gemma.gemma_provider.GemmaModelProvider2B#

Bases: bridge.models.gemma.gemma_provider.GemmaModelProvider

Configuration for a 2B parameter Gemma model.

Specific configuration for the 2B Gemma model with 18 layers, 2048 hidden size, and 8 attention heads.

num_layers: int#

18

hidden_size: int#

2048

num_attention_heads: int#

8

num_query_groups: int#

1

ffn_hidden_size: int#

16384

class bridge.models.gemma.gemma_provider.GemmaModelProvider7B#

Bases: bridge.models.gemma.gemma_provider.GemmaModelProvider

Configuration for a 7B parameter Gemma model.

Specific configuration for the 7B Gemma model with 28 layers, 3072 hidden size, and 16 attention heads.

num_layers: int#

28

hidden_size: int#

3072

num_attention_heads: int#

16

num_query_groups: int#

16

ffn_hidden_size: int#

24576

class bridge.models.gemma.gemma_provider.CodeGemmaModelProvider2B#

Bases: bridge.models.gemma.gemma_provider.GemmaModelProvider2B

Configuration for a 2B parameter Code Gemma model.

Extends GemmaModelProvider with specific settings for code generation. Thism model has an identical configuration to GemmaModelProvider2B.

class bridge.models.gemma.gemma_provider.CodeGemmaModelProvider7B#

Bases: bridge.models.gemma.gemma_provider.GemmaModelProvider7B

Configuration for a 7B parameter Code Gemma model.

Extends GemmaModelProvider with specific settings for code generation. This model has an identical configuration to GemmaModelProvider7B.