Gemma 4
Gemma 4 is Googleโs next-generation multimodal Gemma family, supporting image-text inputs with a Mixture-of-Experts (MoE) language backbone at larger scales. NeMo AutoModel replaces the HF-native dense matmul over experts with the NeMo GroupedExperts backend, enabling Expert Parallelism (EP) through the standard MoE parallelizer.
Available Models
- Gemma 4 E2B IT (VL, dense, kv-shared layers)
- Gemma 4 E4B IT (VL, dense, kv-shared layers), Gemma 4 E4B IT Assistant (Assistant/drafter model for MTP)
- Gemma 4 31B IT (VL, dense)
- Gemma 4 26B-A4B IT (VL, MoE)
Architecture
Gemma4ForConditionalGenerationGemma4AssistantForCausalLM(MTP drafter / assistant head for speculative decoding; co-trainable with the target Gemma 4 base usingGemma4WithDrafter)
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and VLM Fine-Tuning Guide.
Fine-Tuning
See the VLM Fine-Tuning Guide.