T5 Results

You can also prompt-learn on top of any .nemo trained checkpoint file on the SQuAD task mentioned in the section T5 and mT5 Prompt Learning. The results are shown in the table below.

Task

Metric

220M

3B

SQuAD Exact Match 74.20 78.52
SQuAD F1 84.54 87.17

Training the 220M T5 model to convergence takes 4 days, and the loss curve is shown in the figure below:

220M_T5_loss_final.svg

220M T5 Training Loss

The table below shows the converged training loss, the throughput, and the total time to train for the 220M T5 model, using a given number of GPUs and a given Global Batch Size (GBS).

32 2048 512 1T 1.501 3,273,728 4

Training the 3B T5 model to convergence takes 11 days, and the loss curve of a fully trained model can be seen in the figure below:

3B_T5_loss_100percent.svg

3B T5 Training Loss

The table below shows the converged training loss, the throughput, and the total time to train for the 3B T5 model, using a given number of GPUs and a given Global Batch Size (GBS).

160 2160 512 1T 1.147 1,395,131 11

Training performance: NVIDIA DGX SuperPOD (20 × 8 × A100 80GB for 3B T5 Model)

NVIDIA measured the throughput of training a 3B parameter T5 model on NVIDIA DGX SuperPOD using different numbers of nodes. Scaling from 1 node to 20 nodes yielded a 16.38× speed-up.

NVIDIA is actively working on improving the scaling performance for T5 models. The table and chart below show the performance results.

Nodes

1 2 4 5 10 20
Tokens per Second 110769 215579 417644 515100 957506 1626353
3B Perfect Linear Scaling (Tokens) 110769 221538 443077 553846 1107692 2215385
Speed-up 1x 1.95x 3.77x 4.65x 8.64x 14.68x
3B_T5_throughput_2208.svg

3B T5 NeMo Framework Throughput

Inference performance was measured for NVIDIA DGX SuperPOD (1 × 8 × A100 80GB). The results are shown in the table below.

Inference configurations:

  • Batch size: 1

  • Input tokens length: 60

  • Output tokens length: 20

infer_model_size_t5.svg

Average Latency vs T5 Model Size

T5 Model size

Average latency [ms]

TP

PP

GPUs

3B 94 2 1 2
11B 123 4 1 4
23B 213 4 1 4
41B 332 8 1 8
Previous Model Fine-Tuning
Next mT5
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.