T5

Released in 2020, Google’s “Text-to-text Transfer Transformer” (T5) is based on the transformer encoder-decoder framework. Trained on a large dataset of unified text-to-text format, T5 is a capable model in a number of tasks. With sizes ranging from “small” to 11B - T5 can tackle a wide range of use-cases.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

N/A

Slurm

Base Command Manager

Base Command Platform

Distributed data preprcessing

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

Distributed Optimizer

Distributed Checkpoint

N/A

Fully Shared Data Parallel

N/A