Performance

Evaluation Process#

This section shows the latency and throughput numbers for the Riva NMT service on different GPUs.

These numbers were captured after the preconfigured NMT pipelines were deployed from our Quick Start scripts.

The command used to measure performance was:

riva_nmt_t2t_client
  --riva_uri=0.0.0.0:50051
  --model_name=<model name>
  --batch_size=<batch size>
  --target_language_code=<target language code>
  --source_language_code=<source language code>
  --text_file=<wmt_filename>

The riva_nmt_t2t_client returns the following latency measurements:

latency: the overall latency of all returned responses. This is what is tabulated in the tables below.

Results#

Latencies and throughput measurements with the megatronnmt_any_any_1b model are reported in the following tables. Throughput is measured in terms of tokens (words) translated per second.

For specifications of the hardware on which these measurements were collected, refer to the Hardware Specifications section.

batch size	tokens/second	p90	p95	p99
1	26.2867	2.03927	2.50127	3.38444
2	37.9656	2.66554	3.13718	4.35343
4	52.68	3.90231	4.27135	6.70392
8	60.0354	6.66561	7.94382	12.7633

batch size	tokens/second	p90	p95	p99
1	31.9811	2.33799	2.66042	3.81389
2	42.7697	2.92146	3.48803	4.87088
4	55.6321	4.42112	5.65174	7.68267
8	63.8819	7.67295	9.85235	12.8664

batch size	tokens/second	p90	p95	p99
1	26.3775	1.34108	1.54733	1.99489
2	38.9406	1.6354	1.87501	2.2296
4	56.8026	2.16781	2.43267	2.74134
8	74.5635	3.30279	3.59166	4.28577

batch size	tokens/second	p90	p95	p99
1	37.3683	1.04883	1.2315	1.5244
2	56.919	1.27984	1.44713	1.80017
4	84.6404	1.62568	1.8017	2.21461
8	116.29	2.32152	2.69288	3.06221

batch size	tokens/second	p90	p95	p99
1	26.407	1.39359	1.59803	2.0832
2	38.9544	1.70536	1.89963	2.53018
4	57.3626	2.17563	2.41663	4.14745
8	74.4845	3.24898	3.77144	7.65627

batch size	tokens/second	p90	p95	p99
1	25.9981	0.996788	1.16798	1.65174
2	37.7592	1.23002	1.43918	2.28275
4	55.8361	1.59303	1.92639	2.96886
8	72.8569	2.47273	3.09559	4.60961

batch size	tokens/second	p90	p95	p99
1	25.0178	2.16171	2.67937	3.57496
2	35.7852	2.8741	3.43384	4.825
4	45.7635	4.75694	5.46036	8.43625
8	53.6928	7.49429	9.33626	15.9536

batch size	tokens/second	p90	p95	p99
1	30.4228	2.45499	2.8447	4.16917
2	39.371	3.23598	3.9612	5.80284
4	47.5513	5.20845	6.5702	9.05081
8	53.4489	9.45831	12.3315	16.7165

batch size	tokens/second	p90	p95	p99
1	25.7329	1.4008	1.59731	2.02982
2	37.3419	1.70973	1.97817	2.38832
4	52.8377	2.39732	2.73125	3.28759
8	62.3088	3.89276	4.19171	5.01999

batch size	tokens/second	p90	p95	p99
1	37.6181	1.06453	1.2618	1.56666
2	54.3727	1.34588	1.54159	1.90145
4	77.2692	1.85304	2.06099	2.62032
8	100.463	2.67849	3.19095	3.68078

batch size	tokens/second	p90	p95	p99
1	25.6851	1.45086	1.65187	2.14367
2	37.6205	1.77873	2.00517	2.71895
4	52.6492	2.39175	2.70523	4.71649
8	62.1082	3.8764	4.39479	9.33342

batch size	tokens/second	p90	p95	p99
1	26.3125	1.00717	1.19257	1.67369
2	36.9222	1.2762	1.478	2.46823
4	51.398	1.83249	2.24098	3.33602
8	63.8143	2.85154	3.58795	5.41236

batch size	tokens/second	p90	p95	p99
1	24.8964	2.12924	2.65001	3.58206
2	34.9882	2.98058	3.54244	5.17292
4	44.407	4.79422	5.67429	8.66048
8	52.0275	7.6554	9.58992	15.6379

batch size	tokens/second	p90	p95	p99
1	29.9773	2.48201	2.85844	4.27288
2	38.6565	3.3019	4.04395	6.12123
4	45.7503	5.44284	7.03912	9.69829
8	51.1056	9.83193	12.5469	16.8656

batch size	tokens/second	p90	p95	p99
1	25.3765	1.38744	1.56635	2.01276
2	37.2191	1.7271	2.00251	2.42118
4	51.1486	2.51382	2.8324	3.51729
8	59.8422	4.07464	4.40873	5.24835

batch size	tokens/second	p90	p95	p99
1	36.2303	1.0951	1.27495	1.57815
2	53.9788	1.35253	1.54013	1.93965
4	76.3488	1.85225	2.07805	2.64824
8	96.3217	2.80936	3.31854	3.90351

batch size	tokens/second	p90	p95	p99
1	25.2758	1.44721	1.65356	2.15285
2	37.604	1.77375	2.01242	2.72198
4	51.4024	2.49608	2.79586	5.08613
8	59.5006	4.04908	4.63766	9.81065

batch size	tokens/second	p90	p95	p99
1	25.2419	1.03367	1.22135	1.71075
2	36.4417	1.27991	1.50347	2.44608
4	50.6761	1.8483	2.28897	3.48173
8	61.1923	2.97418	3.84112	5.69605

batch size	tokens/second	p90	p95	p99
1	23.864	2.21488	2.74259	3.76037
2	33.5527	3.10486	3.74949	5.343
4	42.1649	5.07062	5.93531	9.05
8	47.4944	8.55144	10.6804	18.5123

batch size	tokens/second	p90	p95	p99
1	28.6128	2.60506	3.0003	4.4652
2	36.6462	3.51057	4.29037	6.38079
4	43.646	5.66035	7.43962	10.2525
8	46.5824	10.8355	14.4172	19.4719

batch size	tokens/second	p90	p95	p99
1	24.1934	1.4668	1.6551	2.11026
2	35.5399	1.81347	2.08718	2.53916
4	47.9159	2.66117	2.97279	3.50095
8	57.1626	4.3184	4.66154	5.64512

batch size	tokens/second	p90	p95	p99
1	34.9025	1.13927	1.33301	1.64506
2	51.477	1.42015	1.62161	2.02865
4	71.5458	2.03029	2.23385	2.80834
8	91.6256	3.00349	3.51311	4.12845

batch size	tokens/second	p90	p95	p99
1	24.0672	1.5234	1.74705	2.25971
2	35.6375	1.8885	2.14569	2.97039
4	47.8865	2.65449	2.95025	5.46151
8	56.8241	4.30345	4.90771	10.5702

batch size	tokens/second	p90	p95	p99
1	24.3206	1.07742	1.26691	1.793
2	34.7435	1.35411	1.58727	2.62002
4	48.3788	1.91256	2.35622	3.66184
8	57.9614	3.2078	4.05819	6.1864

batch size	tokens/second	p90	p95	p99
1	40.5707	1.30604	1.5894	2.16364
2	56.9553	1.79082	2.16168	3.29733
4	66.8099	3.26993	3.64887	6.21649
8	62.5757	6.792	8.24721	13.8332

batch size	tokens/second	p90	p95	p99
1	49.4655	1.48434	1.73773	2.54071
2	61.9468	2.11053	2.53549	3.89271
4	66.7364	4.09326	4.85237	7.36323
8	61.9911	8.39187	11.1865	14.189

batch size	tokens/second	p90	p95	p99
1	40.4369	0.867933	0.983312	1.27003
2	58.9499	1.08901	1.2567	1.5336
4	77.0764	1.69687	1.93364	2.22511
8	79.8035	3.2676	3.61598	4.42012

batch size	tokens/second	p90	p95	p99
1	58.1328	0.671243	0.787934	0.980829
2	87.3839	0.83912	0.962738	1.19282
4	118.819	1.23386	1.37922	1.73483
8	133.943	2.14844	2.5295	2.95827

batch size	tokens/second	p90	p95	p99
1	40.4461	0.897417	1.02884	1.32677
2	59.6298	1.12109	1.25972	1.7518
4	77.4412	1.68057	1.88498	3.61804
8	79.0987	3.16789	3.72364	8.32152

batch size	tokens/second	p90	p95	p99
1	40.7501	0.625767	0.745007	1.02009
2	58.8772	0.795955	0.932173	1.55014
4	79.4899	1.19318	1.46917	2.35335
8	83.196	2.30013	2.96352	4.70573

batch size	tokens/second	p90	p95	p99
1	14.9059	4.18064	5.55622	8.90397
2	13.6982	8.77877	11.1217	18.41
4	11.9895	20.2857	23.3168	42.145
8	9.84107	45.3629	55.302	99.5216

batch size	tokens/second	p90	p95	p99
1	15.731	5.57288	6.65737	11.7328
2	13.4258	10.9427	14.2988	21.9515
4	11.1191	24.9354	34.4626	52.6908
8	9.31464	56.1604	76.3938	104.507

batch size	tokens/second	p90	p95	p99
1	17.9065	2.14465	2.66442	3.96079
2	17.6863	4.13346	5.24532	6.7035
4	16.281	8.86584	10.5843	12.674
8	13.819	19.8851	21.9904	27.6311

batch size	tokens/second	p90	p95	p99
1	26.7904	1.63459	2.00438	2.7469
2	27.7218	3.06423	3.66466	5.05088
4	25.8234	6.25622	7.28332	9.68234
8	22.6921	13.3635	15.8901	18.9904

batch size	tokens/second	p90	p95	p99
1	17.3311	2.3286	2.87607	4.3078
2	17.2567	4.36889	5.3378	7.67524
4	15.9637	8.94096	10.3726	23.2293
8	13.4777	19.4014	23.5091	56.2237

batch size	tokens/second	p90	p95	p99
1	18.8599	1.44669	1.87181	3.1101
2	19.022	2.87861	3.5337	7.27692
4	17.3604	5.93223	7.70771	14.4663
8	14.1413	14.1156	18.7657	30.8777

batch size	tokens/second	p90	p95	p99
1	24.8926	2.14594	2.67499	3.60681
2	34.6774	2.99878	3.67241	5.14107
4	43.0674	5.10826	5.87656	9.39341
8	47.3776	8.54803	10.6197	17.5638

batch size	tokens/second	p90	p95	p99
1	29.8693	2.51427	2.89999	4.30795
2	38.044	3.39933	4.15936	6.2249
4	43.6223	5.74422	7.59899	10.6489
8	45.0687	11.1592	14.7241	20.0668

batch size	tokens/second	p90	p95	p99
1	25.3254	1.40709	1.59688	2.0528
2	36.817	1.74851	2.02427	2.45473
4	50.5628	2.54396	2.91978	3.61169
8	56.0985	4.47452	4.82638	5.72361

batch size	tokens/second	p90	p95	p99
1	36.8149	1.08212	1.27069	1.57721
2	53.7507	1.37057	1.56666	1.94615
4	74.8121	1.90943	2.13149	2.75404
8	91.2649	3.03928	3.61824	4.19196

batch size	tokens/second	p90	p95	p99
1	25.3073	1.45916	1.66434	2.1714
2	37.2553	1.80097	2.03974	2.80161
4	50.3437	2.58562	2.94337	5.41708
8	56.0202	4.39519	5.0318	10.5843

batch size	tokens/second	p90	p95	p99
1	25.6837	1.01961	1.20561	1.69398
2	36.3235	1.30864	1.53248	2.54312
4	49.6179	1.88027	2.28496	3.55785
8	58.6442	3.15507	4.11201	6.14629

batch size	tokens/second	p90	p95	p99
1	34.0965	1.58011	1.96628	2.72448
2	46.4873	2.26697	2.71925	3.91062
4	55.8697	4.02463	4.48315	7.21816
8	62.8854	6.56656	8.19178	12.6974

batch size	tokens/second	p90	p95	p99
1	40.779	1.86284	2.15408	3.22027
2	50.1581	2.63038	3.16101	4.68046
4	57.2902	4.3511	5.57995	7.81723
8	62.8918	8.30041	10.1785	14.5662

batch size	tokens/second	p90	p95	p99
1	35.271	1.02293	1.16197	1.49121
2	49.5213	1.31475	1.52514	1.86395
4	66.6292	1.9028	2.17766	2.81436
8	73.0008	3.37777	3.61952	4.34075

batch size	tokens/second	p90	p95	p99
1	51.3997	0.784364	0.933197	1.1595
2	72.977	1.0184	1.17174	1.46658
4	97.3946	1.48676	1.64859	2.09218
8	120.295	2.30302	2.62656	3.18205

batch size	tokens/second	p90	p95	p99
1	35.1238	1.0647	1.21599	1.60668
2	49.6636	1.37161	1.55412	2.12787
4	66.8252	1.90341	2.14164	3.96884
8	72.9761	3.32665	3.78859	8.20151

batch size	tokens/second	p90	p95	p99
1	35.956	0.73739	0.878487	1.25711
2	49.3781	0.963772	1.14114	1.93997
4	65.3594	1.4497	1.79878	2.6903
8	78.4195	2.40052	2.92004	4.53465

Hardware Specifications#

GPU
NVIDIA DGX A100 40 GB
CPU
Model	AMD EPYC 7742 64-Core Processor
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	2250
CPU min MHz	1500
RAM
Model	Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz
Configured Memory Speed	2933 MT/s
RAM Size	32x64GB (2048GB Total)

GPU
NVIDIA A40
CPU
Model	AMD EPYC 7763 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	3529
CPU min MHz	1500
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed	3200 MT/s
RAM Size	16x32GB (512GB Total)

GPU
NVIDIA A30
CPU
Model	AMD EPYC 7742 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	2
Frequency boost	disabled
CPU max MHz	2250.0000
CPU min MHz	1500.0000
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed	3200 MT/s
RAM Size	32x64GB (2048GB Total)

GPU
NVIDIA A10
CPU
Model	AMD EPYC 7763 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	2450
CPU min MHz	1500
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200 MHz
Configured Memory Speed	3200 MT/s
RAM Size	16x32GB (512GB Total)

GPU
NVIDIA H100 80GB HBM3
CPU
Model	Intel(R) Xeon(R) Platinum 8480CL
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	56
NUMA node(s)	2
CPU max MHz	3800
CPU min MHz	800
RAM
Model	Micron DDR5 MTC40F2046S1RC48BA1 4800MHz
Configured Memory Speed	4400 MT/s
RAM Size	32x64GB (2048GB Total)

GPU
NVIDIA V100 SXM2 16 GB
CPU
Model	Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	20
NUMA node(s)	2
CPU max MHz	3600
CPU min MHz	1200
RAM
Model	Micron DDR4 36ASF4G72PZ-2G6D1 2667MHz
Configured Memory Speed	2133 MT/s
RAM Size	16x32GB (512GB Total)

GPU
NVIDIA T4
CPU
Model	Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	18
NUMA node(s)	2
CPU max MHz	3900
CPU min MHz	1000
RAM
Model	Samsung DDR4 M393A2K43BB1-CTD 2666MHz
Configured Memory Speed	2666 MT/s
RAM Size	24x16GB (384GB Total)

GPU
NVIDIA L4
CPU
Model	AMD EPYC 7763 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	3529
CPU min MHz	1500
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed	3200 MT/s
RAM Size	16x32GB (512GB Total)

GPU
NVIDIA L40
CPU
Model	AMD EPYC 7763 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	3529
CPU min MHz	1500
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed	3200 MT/s
RAM Size	16x32GB (512GB Total)

Model Accuracy#

Riva NMT models are evaluated using the BLEU (Bilingual Evaluation Understudy) score, which is an industry-standard metric for evaluating machine translation quality.

BLEU scores range from 0 to 100, where higher scores indicate better translation quality. The score measures how similar the machine translation output is to one or more reference human translations by:

Comparing n-gram matches between the machine translation and reference translations
Applying penalties for translations that are too short or too long
Combining these components into a final score

The table below shows BLEU scores of RIVA NMT Megatron 1.6B any2any model for translation between any pair of languages in the supported set (de, es-ES, es-US, fr, ja, ru, zh-CN) for Flores-101 dataset, where each row represents the source language and each column represents the target language. Higher scores indicate better translation quality.

Source/Target	de	es-ES	es-US	fr	ja	ru	zh-CN
de	-	24.5	24.1	39.3	27.3	26.1	33.3
es-ES	22.1	-	-	30.3	23.5	20.2	29.8
es-US	22.1	-	-	30.3	23.5	20.2	29.8
fr	25.0	24.8	30.4	-	26.6	25.5	32.7
ja	16.9	16.4	18.1	23.7	-	15.2	28.9
ru	22.4	21.9	26.4	33.4	25.4	-	30.9
zh-CN	17.5	17.3	19.1	25.6	16.8	23.7	-

The table below shows BLEU scores of RIVA NMT Megatron 1.6B any2any model for translation between English and various target languages for Flores-101 dataset.

Language	English to Target ⬆️	Target to English ⬆️
Arabic	28.0	40.6
Brazilian Portuguese	49.8	50.5
Bulgarian	41.8	42.1
Croatian	27.9	37.8
Czech	32.9	41.1
Danish	46.2	49.6
Dutch	26.7	32.6
Estonian	27.3	38.9
European Portuguese	48.1	50.5
European Spanish	27.6	30.7
Finnish	22.7	35.0
French	50.5	46.5
German	38.2	45.2
Greek	27.5	36.5
Hindi	33.5	39.9
Hungarian	26.7	36.9
Indonesian	47.2	44.9
Italian	29.9	34.5
Japanese	32.5	26.7
Korean	28.0	29.5
Latin American Spanish	26.8	30.7
Latvian	31.0	37.0
Lithuanian	27.5	35.1
Norwegian	34.0	44.8
Polish	20.8	30.3
Romanian	40.7	45.0
Russian	31.3	36.1
Simplified Chinese	39.5	28.5
Slovak	35.0	40.6
Slovenian	30.7	36.2
Swedish	45.0	49.6
Thai	30.9	28.1
Traditional Chinese	30.8	26.8
Turkish	29.5	38.8
Ukrainian	30.7	40.2
Vietnamese	41.8	36.9

NVIDIA Riva

Contents

Performance#

Evaluation Process#

Results#

Hardware Specifications#

Model Accuracy#