Gemma 4 VL (26B-A4B MoE)#

Google’s Gemma 4 26B-A4B is a Mixture-of-Experts vision-language model (26B total, 4B active parameters). It pairs a 128-expert top-k=8 MoE language backbone with a SigLIP vision tower, dual sliding/global attention, and K=V tying on the full-attention layers.

NeMo Megatron Bridge supports HF↔Megatron conversion, full SFT, and LoRA PEFT on image-text datasets. The finetuned model can be re-exported to 🤗 Hugging Face format for downstream evaluation or deployment.