Gemma Models#

This page provides detailed technical specifications for the Gemma model family supported by NeMo Customizer. For supported features and capabilities, refer to Tested Models.

Before You Start#

Ensure that hfTargetDownload is enabled in the Helm configuration and that a Hugging Face API key secret is available. Refer to the Hugging Face API Key Secret Guide for setup instructions.


Gemma 2B Instruct#

Property

Value

Creator

Google

Architecture

decoder transformer (causal LM)

Description

Gemma 2B Instruct is a compact instruction-tuned model suitable for efficient customization and deployment.

Max I/O Tokens

Not specified

Parameters

2.51B

Training Data

Not specified

Native data type

BF16

Recommended GPU count for customization

1

Default Name

google/gemma-2b-it

HF Model URI

hf://google/gemma-2b-it

Training Options (2B IT)#

  • SFT (LoRA)

  • Sequence Packing: Not supported

Default training max sequence length (platform default): 1,024.

Prompt Template#

Gemma instruct models require a chat template. NeMo Customizer does not auto-apply a default mapping for Gemma, so provide a prompt template explicitly based on your dataset format.

Important

Gemma 2 Chat Template Limitation: The base model’s chat template does not support system messages. For technical details about the chat template implementation, refer to the tokenizer configuration.

Inference Notes#

  • Preferred data type: BF16 (native weights data type)

  • Quantization: 8-bit and 4-bit supported with bitsandbytes

  • Optional optimization: Use Flash Attention 2 when you install flash-attn

Note

Sequence packing is not supported for Gemma models in NeMo Customizer. If enabled, the platform disables it automatically.