Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

RETRO

Released in December 2021, the Retrieval-Enhanced Transformer (RETRO) model is an innovative approach to enhancing auto-regressive language models. Developed by researchers at DeepMind, RETRO leverages retrieval from a large text database as a complementary memory to scale reasoning capabilities without significantly increasing computational requirements. More information is available in the companion paper “Improving language models by retrieving from trillions of tokens”.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

Slurm

Base Command Manager

Kubernetes

Distributed data preprcessing

N/A

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

Distributed Optimizer

Distributed Checkpoint

Fully Shared Data Parallel