Released in December 2021, the Retrieval-Enhanced Transformer (RETRO) model is an innovative approach to enhancing auto-regressive language models. Developed by researchers at DeepMind, RETRO leverages retrieval from a large text database as a complementary memory to scale reasoning capabilities without significantly increasing computational requirements. More information is available in the companion paper “Improving language models by retrieving from trillions of tokens”.
- Data Preparation
- Training
- Training with Predefined Configurations
- Model Inferencing
- Model Evaluation
Feature |
Status |
---|---|
Data parallelism | ✓ |
Tensor parallelism | ✗ |
Pipeline parallelism | ✗ |
Interleaved Pipeline Parallelism Sched | N/A |
Sequence parallelism | ✗ |
Selective activation checkpointing | ✓ |
Gradient checkpointing | ✓ |
Partial gradient checkpointing | ✓ |
FP32/TF32 | ✓ |
AMP/FP16 | ✓ |
BF16 | ✓ |
TransformerEngine | ✓ |
TransformerEngine/FP8 | ✗ |
Multi-GPU | ✓ |
Multi-Node | ✓ |
Inference | ✓ |
Slurm | ✓ |
Base Command Manager | ✓ |
Kubernetes | ✗ |
Distributed data preprcessing | N/A |
NVfuser | ✗ |
P-Tuning and Prompt Tuning | ✗ |
IA3 and Adapter learning | ✗ |
Distributed Optimizer | ✓ |
Distributed Checkpoint | ✓ |
Fully Shared Data Parallel | ✗ |