T5 Results - NVIDIA Docs

Training Accuracy Results

You can also prompt-learn on top of any .nemo trained checkpoint file on the SQuAD task mentioned in the section T5 and mT5 Prompt Learning. The results are shown in the table below.

Task	Metric	220M	3B
SQuAD	Exact Match	74.20	78.52
SQuAD	F1	84.54	87.17

Training the 220M T5 model to convergence takes 4 days, and the loss curve is shown in the figure below:

220M T5 Training Loss

The table below shows the converged training loss, the throughput, and the total time to train for the 220M T5 model, using a given number of GPUs and a given Global Batch Size (GBS).

32

2048

512

1T

1.501

3,273,728

4

Training the 3B T5 model to convergence takes 11 days, and the loss curve of a fully trained model can be seen in the figure below:

3B T5 Training Loss

The table below shows the converged training loss, the throughput, and the total time to train for the 3B T5 model, using a given number of GPUs and a given Global Batch Size (GBS).

160

2160

512

1T

1.147

1,395,131

11

Training Performance Results

Training performance: NVIDIA DGX SuperPOD (20 × 8 × A100 80GB for 3B T5 Model)

NVIDIA measured the throughput of training a 3B parameter T5 model on NVIDIA DGX SuperPOD using different numbers of nodes. Scaling from 1 node to 20 nodes yielded a 16.38× speed-up.

NVIDIA is actively working on improving the scaling performance for T5 models. The table and chart below show the performance results.

					Nodes
		1	2	4	5	10	20
	Tokens per Second	110769	215579	417644	515100	957506	1626353
3B	Perfect Linear Scaling (Tokens)	110769	221538	443077	553846	1107692	2215385
	Speed-up	1x	1.95x	3.77x	4.65x	8.64x	14.68x

3B T5 NeMo Framework Throughput

Inference Performance

Inference performance was measured for NVIDIA DGX SuperPOD (1 × 8 × A100 80GB). The results are shown in the table below.

Inference configurations:

Batch size: 1
Input tokens length: 60
Output tokens length: 20

Average Latency vs T5 Model Size

T5 Model size	Average latency [ms]	TP	PP	GPUs
3B	94	2	1	2
11B	123	4	1	4
23B	213	4	1	4
41B	332	8	1	8