Gemma Models#
This page provides detailed technical specifications for the Gemma model family supported by NeMo Customizer. For supported features and capabilities, refer to Tested Models.
Before You Start#
Ensure that hfTargetDownload is enabled in the Helm configuration and that a Hugging Face API key secret is available. Refer to the Hugging Face API Key Secret Guide for setup instructions.
Gemma 2B Instruct#
| Property | Value | 
|---|---|
| Creator | |
| Architecture | decoder transformer (causal LM) | 
| Description | Gemma 2B Instruct is a compact instruction-tuned model suitable for efficient customization and deployment. | 
| Max I/O Tokens | Not specified | 
| Parameters | 2.51B | 
| Training Data | Not specified | 
| Native data type | BF16 | 
| Recommended GPU count for customization | 1 | 
| Default Name | google/gemma-2b-it | 
| HF Model URI | 
 | 
Training Options (2B IT)#
- SFT (LoRA) 
- Sequence Packing: Not supported 
Default training max sequence length (platform default): 1,024.
Prompt Template#
Gemma instruct models require a chat template. NeMo Customizer does not auto-apply a default mapping for Gemma, so provide a prompt template explicitly based on your dataset format.
Important
Gemma 2 Chat Template Limitation: The base model’s chat template does not support system messages. For technical details about the chat template implementation, refer to the tokenizer configuration.
Inference Notes#
- Preferred data type: BF16 (native weights data type) 
- Quantization: 8-bit and 4-bit supported with - bitsandbytes
- Optional optimization: Use Flash Attention 2 when you install - flash-attn
Note
Sequence packing is not supported for Gemma models in NeMo Customizer. If enabled, the platform disables it automatically.