Gemma 3 VL / Gemma 3n
Gemma 3 VL / Gemma 3n
Gemma 3 VL is Google’s multimodal extension of Gemma 3, supporting image-text inputs for tasks like image captioning and visual question answering. Gemma 3n is a next-generation efficiency-focused variant.
Available Models
- Gemma 3 27B IT (VL)
- Gemma 3 4B IT (VL)
- Gemma 3n 4B (VL)
Architecture
Gemma3ForConditionalGeneration
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and VLM Fine-Tuning Guide.
Fine-Tuning
See the Gemma 3 & Gemma 3n Fine-Tuning Guide for detailed instructions on dataset preparation, configuration, and multi-GPU training.