Griffin (Recurrent Gemma)

User Guide (Latest Version)

Released in April 2024, Google’s Recurrent Gemma which is baesd on the Griffin architecture, is an open model based on the work (Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models). Griffin is a hybrid architecture that combines gated linear recurrences with local sliding window attention, and is designed for a variety of text generation tasks, including question answering, summarization, and reasoning. Its unique architecture offers several advantages over its predecessor, Gemma, including reduced memory usage, which allows for the generation of longer samples on devices with limited memory, such as single GPUs or CPUs. Additionally, Griffin architecture achieves higher throughput, enabling it to perform inference at significantly higher batch sizes and generate more tokens per second, especially for long sequences. Recurrent Gemma is currently offered at a 2.7B checkpoint, underscoring the model’s efficiency and scalability for the Griffin architecture.



Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
Base Command Manager
Base Command Platform
Distributed data preprocessing
P-Tuning and Prompt Tuning
Adapter learning
Distributed Optimizer
Distributed Checkpoint
Previous Parameter Efficient Fine-Tuning (PEFT)
Next Data Preparation for SFT and PEFT
© | | | | | | |. Last updated on Jun 19, 2024.