bridge.models.gemma.gemma_provider
#
Module Contents#
Classes#
Configuration class for Gemma models. |
|
Configuration for a 2B parameter Gemma model. |
|
Configuration for a 7B parameter Gemma model. |
|
Configuration for a 2B parameter Code Gemma model. |
|
Configuration for a 7B parameter Code Gemma model. |
API#
- class bridge.models.gemma.gemma_provider.GemmaModelProvider#
Bases:
megatron.bridge.models.gpt_provider.GPTModelProvider
Configuration class for Gemma models.
- normalization: str#
‘RMSNorm’
- activation_func: Callable#
None
- gated_linear_unit: bool#
True
- position_embedding_type: str#
‘rope’
- add_bias_linear: bool#
False
- seq_length: int#
8192
- kv_channels: int#
256
- attention_dropout: float#
0.0
0.0
True
- layernorm_zero_centered_gamma: bool#
True
- attention_backend: megatron.core.transformer.enums.AttnBackend#
None
- layernorm_epsilon: float#
1e-06
- vocab_size: int#
256000
- bf16: bool#
True
- params_dtype: torch.dtype#
None
- autocast_dtype: torch.dtype#
None
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Configure and instantiate a Megatron Core Gemma model.
Extends the base configuration with Gemma-specific embedding scaling.
- Parameters:
pre_process – Whether to include pre-processing in the model
post_process – Whether to include post-processing in the model
vp_stage – Virtual pipeline stage
tokenizer – Tokenizer used with the model
- Returns:
Configured Megatron Core GPT model instance
- Return type:
MCoreGPTModel
- class bridge.models.gemma.gemma_provider.GemmaModelProvider2B#
Bases:
bridge.models.gemma.gemma_provider.GemmaModelProvider
Configuration for a 2B parameter Gemma model.
Specific configuration for the 2B Gemma model with 18 layers, 2048 hidden size, and 8 attention heads.
- num_layers: int#
18
2048
- num_attention_heads: int#
8
- num_query_groups: int#
1
16384
- class bridge.models.gemma.gemma_provider.GemmaModelProvider7B#
Bases:
bridge.models.gemma.gemma_provider.GemmaModelProvider
Configuration for a 7B parameter Gemma model.
Specific configuration for the 7B Gemma model with 28 layers, 3072 hidden size, and 16 attention heads.
- num_layers: int#
28
3072
- num_attention_heads: int#
16
- num_query_groups: int#
16
24576
- class bridge.models.gemma.gemma_provider.CodeGemmaModelProvider2B#
Bases:
bridge.models.gemma.gemma_provider.GemmaModelProvider2B
Configuration for a 2B parameter Code Gemma model.
Extends GemmaModelProvider with specific settings for code generation. Thism model has an identical configuration to GemmaModelProvider2B.
- class bridge.models.gemma.gemma_provider.CodeGemmaModelProvider7B#
Bases:
bridge.models.gemma.gemma_provider.GemmaModelProvider7B
Configuration for a 7B parameter Code Gemma model.
Extends GemmaModelProvider with specific settings for code generation. This model has an identical configuration to GemmaModelProvider7B.