Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
RETRO
Released in December 2021, the Retrieval-Enhanced Transformer (RETRO) model is an innovative approach to enhancing auto-regressive language models. Developed by researchers at DeepMind, RETRO leverages retrieval from a large text database as a complementary memory to scale reasoning capabilities without significantly increasing computational requirements. More information is available in the companion paper “Improving language models by retrieving from trillions of tokens”.
Feature |
Status |
---|---|
Data parallelism |
✓ |
Tensor parallelism |
✗ |
Pipeline parallelism |
✗ |
Interleaved Pipeline Parallelism Sched |
N/A |
Sequence parallelism |
✗ |
Selective activation checkpointing |
✓ |
Gradient checkpointing |
✓ |
Partial gradient checkpointing |
✓ |
FP32/TF32 |
✓ |
AMP/FP16 |
✓ |
BF16 |
✓ |
TransformerEngine |
✓ |
TransformerEngine/FP8 |
✗ |
Multi-GPU |
✓ |
Multi-Node |
✓ |
Inference |
✓ |
Slurm |
✓ |
Base Command Manager |
✓ |
Kubernetes |
✗ |
Distributed data preprcessing |
N/A |
NVfuser |
✗ |
P-Tuning and Prompt Tuning |
✗ |
IA3 and Adapter learning |
✗ |
Distributed Optimizer |
✓ |
Distributed Checkpoint |
✓ |
Fully Shared Data Parallel |
✗ |