Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Gemma and CodeGemma

Released in February 2024, Google’s Gemma is an open model based on the work (Gemini v1.5 Report) done to create Google’s Gemini family of models. It adopts the transformer decoder framework while adding multi-query attention, RoPE, GeGLU activations, and more. Gemma is offered at 2B and 7B, providing a powerful model at reasonable sizes. More information is available in Google’s release blog.

Subsequently released in April 2024, CodeGemma joins the Gemma family with a specialization in code understanding and generation.

In June 2024, Google released Gemma 2, which incorporates modern architectural improvements such as sliding window attention, logit soft-capping, and post-normalization for attention and feed-forward layers. Gemma 2 is offered at 2B, 9B and 27B, providing outsized performance while maintaining efficiency. More information is available in Google’s release blog.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

N/A

Slurm

Base Command Manager

Base Command Platform

Distributed data preprcessing

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

Distributed Optimizer

Distributed Checkpoint

Fully Shared Data Parallel

N/A