Gemma Models#

This page provides detailed technical specifications for the Gemma model family supported by NeMo Customizer. For supported features and capabilities, refer to Tested Models.

Before You Start#

Ensure that hfTargetDownload is enabled in the Helm configuration and that a Hugging Face API key secret is available. Refer to the Hugging Face API Key Secret Guide for setup instructions.

Gemma 2B Instruct#

Property	Value
Creator	Google
Architecture	decoder transformer (causal LM)
Description	Gemma 2B Instruct is a compact instruction-tuned model suitable for efficient customization and deployment.
Max I/O Tokens	Not specified
Parameters	2.51B
Training Data	Not specified
Native data type	BF16
Recommended GPU count for customization	1
Default Name	google/gemma-2b-it
HF Model URI	`hf://google/gemma-2b-it`

Training Options (2B IT)#

SFT (LoRA)
Sequence Packing: Not supported

Default training max sequence length (platform default): 1,024.

Prompt Template#

Gemma instruct models require a chat template. NeMo Customizer does not auto-apply a default mapping for Gemma, so provide a prompt template explicitly based on your dataset format.

Important

Gemma 2 Chat Template Limitation: The base model’s chat template does not support system messages. For technical details about the chat template implementation, refer to the tokenizer configuration.

Inference Notes#

Preferred data type: BF16 (native weights data type)
Quantization: 8-bit and 4-bit supported with bitsandbytes
Optional optimization: Use Flash Attention 2 when you install flash-attn

Note

Sequence packing is not supported for Gemma models in NeMo Customizer. If enabled, the platform disables it automatically.