Griffin (Recurrent Gemma)

User Guide (Latest Version)

Released in April 2024, Google’s Recurrent Gemma which is baesd on the Griffin architecture, is an open model based on the work (Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models). Griffin is a hybrid architecture that combines gated linear recurrences with local sliding window attention, and is designed for a variety of text generation tasks, including question answering, summarization, and reasoning. Its unique architecture offers several advantages over its predecessor, Gemma, including reduced memory usage, which allows for the generation of longer samples on devices with limited memory, such as single GPUs or CPUs. Additionally, Griffin architecture achieves higher throughput, enabling it to perform inference at significantly higher batch sizes and generate more tokens per second, especially for long sequences. Recurrent Gemma is currently offered at a 2.7B checkpoint, underscoring the model’s efficiency and scalability for the Griffin architecture.

Feature

Status

Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
FP32/TF32
AMP/FP16
BF16
TransformerEngine/FP8
Multi-GPU
Slurm
Base Command Manager
Base Command Platform
Distributed data preprocessing
NVfuser
P-Tuning and Prompt Tuning
Adapter learning
Distributed Optimizer
Distributed Checkpoint
Previous Parameter Efficient Fine-Tuning (PEFT)
Next Data Preparation for SFT and PEFT
© | | | | | | |. Last updated on Jun 19, 2024.