Is this page helpful?

Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

T5#

Released in 2020, Google’s “Text-to-text Transfer Transformer” (T5) is based on the transformer encoder-decoder framework. Trained on a large dataset of unified text-to-text format, T5 is a capable model in a number of tasks. With sizes ranging from “small” to 11B - T5 can tackle a wide range of use-cases.

Feature	Status
Data parallelism	✓
Tensor parallelism	✓
Pipeline parallelism	✓
Interleaved Pipeline Parallelism Sched	N/A
Sequence parallelism	✗
Selective activation checkpointing	✗
Gradient checkpointing	✓
Partial gradient checkpointing	✓
FP32/TF32	✓
AMP/FP16	✗
BF16	✓
TransformerEngine/FP8	✗
Multi-GPU	✓
Multi-Node	✓
Inference	N/A
Slurm	✓
Base Command Manager	✓
Base Command Platform	✓
Distributed data preprcessing	✓
NVfuser	✗
P-Tuning and Prompt Tuning	✓
IA3 and Adapter learning	✓
Distributed Optimizer	✓
Distributed Checkpoint	N/A
Fully Shared Data Parallel	N/A