Gemma Models#
This page provides detailed technical specifications for the Gemma model family supported by NeMo Customizer. For supported features and capabilities, refer to Tested Models.
Before You Start#
Ensure that hfTargetDownload
is enabled in the Helm configuration and that a Hugging Face API key secret is available. Refer to the Hugging Face API Key Secret Guide for setup instructions.
Gemma 2B Instruct#
Property |
Value |
---|---|
Creator |
|
Architecture |
decoder transformer (causal LM) |
Description |
Gemma 2B Instruct is a compact instruction-tuned model suitable for efficient customization and deployment. |
Max I/O Tokens |
Not specified |
Parameters |
2.51B |
Training Data |
Not specified |
Native data type |
BF16 |
Recommended GPU count for customization |
1 |
Default Name |
google/gemma-2b-it |
HF Model URI |
|
Training Options (2B IT)#
SFT (LoRA)
Sequence Packing: Not supported
Default training max sequence length (platform default): 1,024.
Prompt Template#
Gemma instruct models require a chat template. NeMo Customizer does not auto-apply a default mapping for Gemma, so provide a prompt template explicitly based on your dataset format.
Important
Gemma 2 Chat Template Limitation: The base model’s chat template does not support system messages. For technical details about the chat template implementation, refer to the tokenizer configuration.
Inference Notes#
Preferred data type: BF16 (native weights data type)
Quantization: 8-bit and 4-bit supported with
bitsandbytes
Optional optimization: Use Flash Attention 2 when you install
flash-attn
Note
Sequence packing is not supported for Gemma models in NeMo Customizer. If enabled, the platform disables it automatically.