Gemma#

Gemma is Google’s original lightweight open model family. Megatron Bridge supports Gemma causal language models through the GemmaBridge implementation for the Hugging Face GemmaForCausalLM architecture.

Supported Variants#

Megatron Bridge supports Hugging Face Gemma checkpoints that use the gemma model type, including:

  • Gemma 2B: https://huggingface.co/google/gemma-2b

  • Gemma 7B: https://huggingface.co/google/gemma-7b

  • Gemma release collection: https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b

Architecture Notes#

  • RMSNorm with zero-centered gamma.

  • GeGLU-style gated MLPs.

  • RoPE positional embeddings and flash attention backend.

  • Shared input/output embedding weights.

Examples#

Gemma uses the common conversion and generation entry points:

uv run python examples/conversion/convert_checkpoints.py import \
  --hf-model google/gemma-2b \
  --megatron-path /checkpoints/gemma_2b_megatron
uv run python examples/conversion/hf_to_megatron_generate_text.py \
  --hf_model_path google/gemma-2b \
  --megatron_model_path /checkpoints/gemma_2b_megatron \
  --prompt "What is artificial intelligence?"