Gemma 2#

Google’s Gemma 2 is a family of lightweight, open models built on the same research and technology used to create Gemini models. The Gemma 2 architecture builds on the transformer decoder framework with enhancements including pre-normalization with RMSNorm, GeGLU activations, Rotary Positional Embeddings (RoPE), attention logit softcapping, and sliding window attention.

Gemma 2 models are designed for a wide range of text generation tasks and are available in multiple sizes to suit different computational budgets.

Gemma family models are supported via the Bridge system with auto-detected configuration and weight mapping.

Available Models#

Text-Only Models#

  • Gemma 2 2B (google/gemma-2-2b): Compact 2B parameter model optimized for efficiency

    • 26 layers, 2304 hidden size

    • 8 attention heads, 4 query groups (GQA)

    • Sequence length: 8,192 tokens

    • Ideal for single-GPU deployment

  • Gemma 2 9B (google/gemma-2-9b): Medium-scale 9B parameter model balancing performance and efficiency

    • 42 layers, 3584 hidden size

    • 16 attention heads, 8 query groups (GQA)

    • Sequence length: 8,192 tokens

    • Recommended: 4-8 GPUs with tensor parallelism

  • Gemma 2 27B (google/gemma-2-27b): Large-scale 27B parameter model for maximum performance

    • 46 layers, 4608 hidden size

    • 32 attention heads, 16 query groups (GQA)

    • Sequence length: 8,192 tokens

    • Recommended: 8-16 GPUs with tensor and pipeline parallelism

All models support a sequence length of 8,192 tokens and use sliding window attention for efficient context processing.

Model Architecture Features#

Gemma 2 introduces several architectural innovations:

  • Sliding Window Attention: Local attention with a 4,096 token window for efficient long-context processing

  • Attention Logit Softcapping: Caps attention logits at 50.0 to prevent extreme values

  • Final Logit Softcapping: Caps final output logits at 30.0 for stable generation

  • GeGLU Activation: Uses gated linear units with GELU activation for improved performance

  • RMSNorm: Layer normalization without mean centering for faster computation

  • Rotary Embeddings: RoPE with base frequency 10,000

  • Query Pre-Attention Scaling: Scales queries by 224 before attention computation

Conversion with 🤗 Hugging Face#

Load HF → Megatron#

from megatron.bridge import AutoBridge

# Example: Gemma 2 9B
bridge = AutoBridge.from_hf_pretrained("google/gemma-2-9b")
provider = bridge.to_megatron_provider()

# Configure parallelism before instantiating the model
provider.tensor_model_parallel_size = 8
provider.pipeline_model_parallel_size = 1

provider.finalize()
model = provider.provide_distributed_model(wrap_with_ddp=False)

Import HF → Megatron#

To import the HF model to your desired Megatron path:

# Gemma 2 2B
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-2-2b \
--megatron-path /models/gemma-2-2b

# Gemma 2 9B
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-2-9b \
--megatron-path /models/gemma-2-9b

# Gemma 2 27B
python examples/conversion/convert_checkpoints.py import \
--hf-model google/gemma-2-27b \
--megatron-path /models/gemma-2-27b

Export Megatron → HF#

# Gemma 2 9B example
python examples/conversion/convert_checkpoints.py export \
--hf-model google/gemma-2-9b \
--megatron-path /results/gemma2_9b/checkpoints/iter_00001000 \
--hf-path ./gemma2-9b-hf-export

Run Inference on Converted Checkpoint#

python examples/conversion/hf_to_megatron_generate_text.py \
--hf_model_path google/gemma-2-9b \
--megatron_model_path /models/gemma-2-9b \
--prompt "What is artificial intelligence?" \
--max_new_tokens 100

Note:

  • --megatron_model_path is optional. If not specified, the script will convert the model and then run forward.

Pretrain and Finetune Recipes#

  • See: bridge.recipes.gemma

  • Available recipes:

    • Pretraining:

      • gemma2_2b_pretrain_config: Pre-training configuration for Gemma 2 2B

      • gemma2_9b_pretrain_config: Pre-training configuration for Gemma 2 9B

      • gemma2_27b_pretrain_config: Pre-training configuration for Gemma 2 27B

    • Finetuning:

      • gemma2_2b_finetune_config: Finetuning configuration for Gemma 2 2B with PEFT support (LoRA, DoRA)

      • gemma2_9b_finetune_config: Finetuning configuration for Gemma 2 9B with PEFT support (LoRA, DoRA)

      • gemma2_27b_finetune_config: Finetuning configuration for Gemma 2 27B with PEFT support (LoRA, DoRA)

Before training, ensure the following environment variables are set:

  1. SAVE_DIR: checkpoint and log saving directory

  2. HF_TOKEN: to download models from HF Hub (if required)

  3. HF_HOME: (optional) to avoid re-downloading models and datasets

  4. WANDB_API_KEY: (optional) to enable WandB logging

Pretraining#

Gemma 2 2B#

from megatron.bridge.recipes.gemma import gemma2_2b_pretrain_config

# Create a pretraining configuration
config = gemma2_2b_pretrain_config(
    name="my_gemma2_2b_pretrain",
    data_paths=["path/to/data"],
    train_iters=100000,
    global_batch_size=32,
)

Gemma 2 9B#

from megatron.bridge.recipes.gemma import gemma2_9b_pretrain_config

config = gemma2_9b_pretrain_config(
    name="my_gemma2_9b_pretrain",
    data_paths=["path/to/data"],
    train_iters=100000,
    global_batch_size=32,
)

Gemma 2 27B#

from megatron.bridge.recipes.gemma import gemma2_27b_pretrain_config

config = gemma2_27b_pretrain_config(
    name="my_gemma2_27b_pretrain",
    data_paths=["path/to/data"],
    train_iters=100000,
    global_batch_size=32,
)

Full Finetuning#

Gemma 2 2B#

torchrun --nproc-per-node=8 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-2b \
--recipe gemma2_2b_finetune_config \
--peft none \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma2_2b_finetune

Or programmatically:

from megatron.bridge.recipes.gemma import gemma2_2b_finetune_config

config = gemma2_2b_finetune_config(
    name="gemma2_2b_full_finetune",
    pretrained_checkpoint="/models/gemma-2-2b",
    peft="none",
    train_iters=1000,
    global_batch_size=64,
)

Gemma 2 9B#

torchrun --nproc-per-node=8 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-9b \
--recipe gemma2_9b_finetune_config \
--peft none \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma2_9b_finetune

Gemma 2 27B#

torchrun --nproc-per-node=16 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-27b \
--recipe gemma2_27b_finetune_config \
--peft none \
train.global_batch_size=64 \
train.train_iters=1000 \
checkpoint.save=$SAVE_DIR/gemma2_27b_finetune

Parameter-Efficient Finetuning (PEFT) with LoRA#

Gemma 2 2B#

torchrun --nproc-per-node=8 run/run_recipe.py \
--pretrained-checkpoint /models/gemma-2-2b \
--recipe gemma2_2b_finetune_config \
--peft lora \
train.global_batch_size=128 \
checkpoint.save=$SAVE_DIR/gemma2_2b_lora

PEFT options:

  • --peft: Set to lora for LoRA or dora for DoRA. Set to none for full finetuning.

Or programmatically:

from megatron.bridge.recipes.gemma import gemma2_2b_finetune_config

# LoRA finetuning
config = gemma2_2b_finetune_config(
    name="gemma2_2b_lora_finetune",
    pretrained_checkpoint="/models/gemma-2-2b",
    peft="lora",  # or "dora"
    train_iters=1000,
    global_batch_size=128,
)

Gemma 2 9B LoRA#

from megatron.bridge.recipes.gemma import gemma2_9b_finetune_config

config = gemma2_9b_finetune_config(
    name="gemma2_9b_lora_finetune",
    pretrained_checkpoint="/models/gemma-2-9b",
    peft="lora",
    train_iters=1000,
    global_batch_size=128,
)

Gemma 2 27B LoRA#

from megatron.bridge.recipes.gemma import gemma2_27b_finetune_config

config = gemma2_27b_finetune_config(
    name="gemma2_27b_lora_finetune",
    pretrained_checkpoint="/models/gemma-2-27b",
    peft="lora",
    train_iters=1000,
    global_batch_size=128,
)

Examples#

Hugging Face Model Cards#

  • Gemma 2 2B: https://huggingface.co/google/gemma-2-2b

  • Gemma 2 9B: https://huggingface.co/google/gemma-2-9b

  • Gemma 2 27B: https://huggingface.co/google/gemma-2-27b

  • Gemma 2 Collection: https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315