Gemma 2#
Google’s Gemma 2 is a family of lightweight, open models built on the same research and technology used to create Gemini models. The Gemma 2 architecture builds on the transformer decoder framework with enhancements including pre-normalization with RMSNorm, GeGLU activations, Rotary Positional Embeddings (RoPE), attention logit softcapping, and sliding window attention.
Gemma 2 models are designed for a wide range of text generation tasks and are available in multiple sizes to suit different computational budgets.
Gemma family models are supported via the Bridge system with auto-detected configuration and weight mapping.
Available Models#
Text-Only Models#
Gemma 2 2B (
google/gemma-2-2b): Compact 2B parameter model optimized for efficiency26 layers, 2304 hidden size
8 attention heads, 4 query groups (GQA)
Sequence length: 8,192 tokens
Ideal for single-GPU deployment
Gemma 2 9B (
google/gemma-2-9b): Medium-scale 9B parameter model balancing performance and efficiency42 layers, 3584 hidden size
16 attention heads, 8 query groups (GQA)
Sequence length: 8,192 tokens
Recommended: 4-8 GPUs with tensor parallelism
Gemma 2 27B (
google/gemma-2-27b): Large-scale 27B parameter model for maximum performance46 layers, 4608 hidden size
32 attention heads, 16 query groups (GQA)
Sequence length: 8,192 tokens
Recommended: 8-16 GPUs with tensor and pipeline parallelism
All models support a sequence length of 8,192 tokens and use sliding window attention for efficient context processing.
Model Architecture Features#
Gemma 2 introduces several architectural innovations:
Sliding Window Attention: Local attention with a 4,096 token window for efficient long-context processing
Attention Logit Softcapping: Caps attention logits at 50.0 to prevent extreme values
Final Logit Softcapping: Caps final output logits at 30.0 for stable generation
GeGLU Activation: Uses gated linear units with GELU activation for improved performance
RMSNorm: Layer normalization without mean centering for faster computation
Rotary Embeddings: RoPE with base frequency 10,000
Query Pre-Attention Scaling: Scales queries by 224 before attention computation
Conversion with 🤗 Hugging Face#
Load HF → Megatron#
from megatron.bridge import AutoBridge
# Example: Gemma 2 9B
bridge = AutoBridge.from_hf_pretrained("google/gemma-2-9b")
provider = bridge.to_megatron_provider()
# Configure parallelism before instantiating the model
provider.tensor_model_parallel_size = 8
provider.pipeline_model_parallel_size = 1
provider.finalize()
model = provider.provide_distributed_model(wrap_with_ddp=False)
Import HF → Megatron#
To import the HF model to your desired Megatron path:
# Gemma 2 2B
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-2-2b \
--megatron-path /models/gemma-2-2b
# Gemma 2 9B
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-2-9b \
--megatron-path /models/gemma-2-9b
# Gemma 2 27B
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-2-27b \
--megatron-path /models/gemma-2-27b
Export Megatron → HF#
# Gemma 2 9B example
python examples/conversion/convert_checkpoints.py export \
--hf-model google/gemma-2-9b \
--megatron-path /results/gemma2_9b/checkpoints/iter_00001000 \
--hf-path ./gemma2-9b-hf-export
Run Inference on Converted Checkpoint#
python examples/conversion/hf_to_megatron_generate_text.py \
--hf_model_path google/gemma-2-9b \
--megatron_model_path /models/gemma-2-9b \
--prompt "What is artificial intelligence?" \
--max_new_tokens 100
Note:
--megatron_model_pathis optional. If not specified, the script will convert the model and then run forward.
Pretrain and Finetune Recipes#
See: bridge.recipes.gemma
Available recipes:
Pretraining:
gemma2_2b_pretrain_config: Pre-training configuration for Gemma 2 2Bgemma2_9b_pretrain_config: Pre-training configuration for Gemma 2 9Bgemma2_27b_pretrain_config: Pre-training configuration for Gemma 2 27B
Finetuning:
gemma2_2b_finetune_config: Finetuning configuration for Gemma 2 2B with PEFT support (LoRA, DoRA)gemma2_9b_finetune_config: Finetuning configuration for Gemma 2 9B with PEFT support (LoRA, DoRA)gemma2_27b_finetune_config: Finetuning configuration for Gemma 2 27B with PEFT support (LoRA, DoRA)
Before training, ensure the following environment variables are set:
SAVE_DIR: checkpoint and log saving directoryHF_TOKEN: to download models from HF Hub (if required)HF_HOME: (optional) to avoid re-downloading models and datasetsWANDB_API_KEY: (optional) to enable WandB logging
Pretraining#
Gemma 2 2B#
from megatron.bridge.recipes.gemma import gemma2_2b_pretrain_config
# Create a pretraining configuration
config = gemma2_2b_pretrain_config(
name="my_gemma2_2b_pretrain",
data_paths=["path/to/data"],
train_iters=100000,
global_batch_size=32,
)
Gemma 2 9B#
from megatron.bridge.recipes.gemma import gemma2_9b_pretrain_config
config = gemma2_9b_pretrain_config(
name="my_gemma2_9b_pretrain",
data_paths=["path/to/data"],
train_iters=100000,
global_batch_size=32,
)
Gemma 2 27B#
from megatron.bridge.recipes.gemma import gemma2_27b_pretrain_config
config = gemma2_27b_pretrain_config(
name="my_gemma2_27b_pretrain",
data_paths=["path/to/data"],
train_iters=100000,
global_batch_size=32,
)
Full Finetuning#
Gemma 2 2B#
torchrun --nproc-per-node=8 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-2b \
--recipe gemma2_2b_finetune_config \
--peft none \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma2_2b_finetune
Or programmatically:
from megatron.bridge.recipes.gemma import gemma2_2b_finetune_config
config = gemma2_2b_finetune_config(
name="gemma2_2b_full_finetune",
pretrained_checkpoint="/models/gemma-2-2b",
peft="none",
train_iters=1000,
global_batch_size=64,
)
Gemma 2 9B#
torchrun --nproc-per-node=8 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-9b \
--recipe gemma2_9b_finetune_config \
--peft none \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma2_9b_finetune
Gemma 2 27B#
torchrun --nproc-per-node=16 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-27b \
--recipe gemma2_27b_finetune_config \
--peft none \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma2_27b_finetune
Parameter-Efficient Finetuning (PEFT) with LoRA#
Gemma 2 2B#
torchrun --nproc-per-node=8 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-2b \
--recipe gemma2_2b_finetune_config \
--peft lora \
train.global_batch_size=128 \
checkpoint.save=$SAVE_DIR/gemma2_2b_lora
PEFT options:
--peft: Set tolorafor LoRA ordorafor DoRA. Set tononefor full finetuning.
Or programmatically:
from megatron.bridge.recipes.gemma import gemma2_2b_finetune_config
# LoRA finetuning
config = gemma2_2b_finetune_config(
name="gemma2_2b_lora_finetune",
pretrained_checkpoint="/models/gemma-2-2b",
peft="lora", # or "dora"
train_iters=1000,
global_batch_size=128,
)
Gemma 2 9B LoRA#
from megatron.bridge.recipes.gemma import gemma2_9b_finetune_config
config = gemma2_9b_finetune_config(
name="gemma2_9b_lora_finetune",
pretrained_checkpoint="/models/gemma-2-9b",
peft="lora",
train_iters=1000,
global_batch_size=128,
)
Gemma 2 27B LoRA#
from megatron.bridge.recipes.gemma import gemma2_27b_finetune_config
config = gemma2_27b_finetune_config(
name="gemma2_27b_lora_finetune",
pretrained_checkpoint="/models/gemma-2-27b",
peft="lora",
train_iters=1000,
global_batch_size=128,
)
Recommended Configurations#
Model |
Mode |
TP |
PP |
Global Batch Size |
Learning Rate |
|---|---|---|---|---|---|
Gemma 2 2B |
Full SFT |
1 |
1 |
64-128 |
5e-6 |
Gemma 2 2B |
LoRA/DoRA |
1 |
1 |
128-256 |
1e-4 |
Gemma 2 9B |
Full SFT |
4 |
1 |
64-128 |
5e-6 |
Gemma 2 9B |
LoRA/DoRA |
1 |
1 |
128-256 |
1e-4 |
Gemma 2 27B |
Full SFT |
8 |
2 |
64-128 |
5e-6 |
Gemma 2 27B |
LoRA/DoRA |
4 |
1 |
128-256 |
1e-4 |
Examples#
Checkpoint import/export: examples/conversion/convert_checkpoints.py
Generate text (HF→Megatron): examples/conversion/hf_to_megatron_generate_text.py
Hugging Face Model Cards#
Gemma 2 2B: https://huggingface.co/google/gemma-2-2b
Gemma 2 9B: https://huggingface.co/google/gemma-2-9b
Gemma 2 27B: https://huggingface.co/google/gemma-2-27b
Gemma 2 Collection: https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315