Neural Machine Translation (Latest)
Neural Machine Translation (Latest)

Performance

This section shows the latency and throughput numbers for the Riva NMT service on different GPUs.

The following command was used to measure performance:

Copy
Copied!
            

riva_nmt_t2t_client --riva_uri=0.0.0.0:50051 --model_name=<model name> --batch_size=<batch size> --target_language_code=<target language code> --source_language_code=<source language code> --text_file=<wmt_filename>

The riva_nmt_t2t_client returns the following latency measurements:

  • latency: the overall latency of all returned responses. This is what is tabulated in the following tables.

You can get the source code for the riva_nmt_t2t_client at https://github.com/nvidia-riva/cpp-clients.

The following tables show the latencies and throughput measurements. Throughput is measured in sentences translated per second.

For information about the hardware that collected these measurements, see the Hardware Specifications section.

batch size

translations/second

p90

p95

p99

1 1.87372 0.920689 1.08201 1.47551
2 2.50597 1.2444 1.47108 1.96975
4 3.16918 1.92869 2.15762 4.26482
8 3.48258 3.28437 3.8072 8.47366

batch size

translations/second

p90

p95

p99

1 2.87276 0.627917 0.780388 1.15155
2 3.66135 0.948616 1.07478 1.89837
4 4.66885 1.30849 1.72828 2.79435
8 5.04019 2.59703 3.491 5.09752

batch size

translations/second

p90

p95

p99

1 2.5204 0.699321 0.909914 1.29494
2 3.30693 1.08113 1.35028 1.89888
4 4.22761 1.74112 1.97779 3.14655
8 4.6737 2.98084 3.74519 6.52791

batch size

translations/second

p90

p95

p99

1 2.97666 0.59925 0.735827 1.11948
2 3.87291 0.900658 1.13785 1.66429
4 4.92714 1.36958 1.72552 2.59247
8 5.56393 2.5402 3.18025 5.85779

batch size

translations/second

p90

p95

p99

1 1.32758 1.39331 1.73098 2.52108
2 1.67142 2.0546 2.61797 3.84186
4 1.95767 3.71373 4.21752 7.75958
8 2.01694 6.81574 8.6714 13.972

batch size

translations/second

p90

p95

p99

1 1.21664 1.58103 1.97206 3.05327
2 1.43854 2.24791 3.0426 4.65663
4 1.60382 4.31419 5.82964 8.14162
8 1.55748 9.50545 10.6703 14.5835

batch size

translations/second

p90

p95

p99

1 1.6397 1.09931 1.38108 2.12866
2 2.09401 1.6393 2.02871 3.4556
4 2.49846 2.77695 3.43139 7.73349
8 2.60394 5.28244 6.63392 14.233

batch size

translations/second

p90

p95

p99

1 1.60941 1.1678 1.51842 2.36442
2 1.97409 1.80278 2.24439 3.83375
4 2.30099 3.12782 3.9447 6.95936
8 2.29916 6.13487 9.22069 14.0041

batch size

translations/second

p90

p95

p99

1 1.97205 0.886273 1.04213 1.43199
2 2.61224 1.22411 1.4301 1.80279
4 3.34587 1.88818 2.24705 2.78781
8 3.67947 3.46478 3.76565 4.85861

batch size

translations/second

p90

p95

p99

1 2.56872 0.681028 0.804774 1.07597
2 3.3472 0.989105 1.15346 1.50089
4 4.38947 1.42871 1.63829 2.12449
8 5.12472 2.40464 2.78384 3.43832

batch size

translations/second

p90

p95

p99

1 3.6477 0.495766 0.618124 0.899573
2 4.73131 0.719162 0.829255 1.54409
4 5.72285 1.12675 1.48843 2.42258
8 5.54803 2.40075 3.26125 4.88046

batch size

translations/second

p90

p95

p99

1 2.566 0.671063 0.781803 1.05472
2 3.40786 0.938218 1.09553 1.49499
4 3.8768 1.61876 1.8546 3.78943
8 3.66088 3.16267 3.70907 8.37703

batch size

translations/second

p90

p95

p99

1 3.74805 0.484533 0.593134 0.902163
2 4.94672 0.699463 0.881004 1.29322
4 6.05463 1.1039 1.46067 2.35874
8 5.71557 2.52177 3.117 6.16315

batch size

translations/second

p90

p95

p99

1 3.3929 0.528577 0.689865 0.974162
2 4.40994 0.819896 1.01791 1.47063
4 5.17404 1.4859 1.70887 2.8181
8 4.94583 2.9022 3.65776 6.59687

batch size

translations/second

p90

p95

p99

1 1.56847 1.1745 1.48006 2.41088
2 1.80375 1.83741 2.5072 3.9312
4 1.79482 3.99013 5.43572 7.66713
8 1.57175 9.60344 10.6467 14.2069

batch size

translations/second

p90

p95

p99

1 1.66571 1.08862 1.35468 1.977
2 2.02682 1.70984 2.18956 3.40245
4 2.06979 3.57831 4.18346 7.88476
8 1.87831 7.54048 9.5857 14.8807

batch size

translations/second

p90

p95

p99

1 2.07451 0.910784 1.17259 1.84924
2 2.47588 1.43357 1.82757 3.37628
4 2.52347 2.94212 3.79328 6.77849
8 2.22887 6.50901 9.92396 14.814

batch size

translations/second

p90

p95

p99

1 2.13625 0.842487 1.04341 1.60627
2 2.69967 1.26584 1.62549 2.91262
4 2.88421 2.49964 3.07377 7.25401
8 2.66523 5.27891 6.75387 14.1079

batch size

translations/second

p90

p95

p99

1 3.51151 0.500012 0.583887 0.772306
2 4.74482 0.684268 0.818367 1.06536
4 5.91097 1.07181 1.25083 1.69175
8 5.86418 2.10006 2.4796 3.11607

batch size

translations/second

p90

p95

p99

1 2.56824 0.684774 0.79945 1.07777
2 3.39711 0.957159 1.11632 1.42855
4 4.01128 1.64192 1.96622 2.46766
8 3.90422 3.26752 3.5553 4.68024

batch size

translations/second

p90

p95

p99

1 1.4291 1.24761 1.50121 2.3264
2 1.86382 1.89852 2.34146 3.97016
4 2.11519 3.38856 4.20307 8.6753
8 2.13149 6.45051 8.98936 16.7489

batch size

translations/second

p90

p95

p99

1 1.41881 1.29899 1.64733 2.58565
2 1.77199 2.07364 2.5714 4.50892
4 1.96346 3.56488 5.05489 7.79614
8 1.91313 8.32652 10.8768 15.854

batch size

translations/second

p90

p95

p99

1 1.74946 0.988257 1.14519 1.53376
2 2.39319 1.35765 1.58754 2.03805
4 2.90098 2.20014 2.55023 3.045
8 3.04578 3.93638 4.21325 5.37325

batch size

translations/second

p90

p95

p99

1 2.26458 0.777135 0.894938 1.16347
2 3.15778 1.02839 1.21504 1.61981
4 3.86324 1.61475 1.83013 2.4488
8 4.41485 2.63231 3.25951 4.81927

batch size

translations/second

p90

p95

p99

1 2.20713 0.811372 1.03459 1.42627
2 2.98033 1.19901 1.51812 2.2026
4 3.58356 2.07059 2.35678 3.65251
8 3.81516 3.54866 4.34197 7.75236

batch size

translations/second

p90

p95

p99

1 2.52089 0.722309 0.869916 1.28357
2 3.42075 1.00453 1.26259 1.91771
4 4.16782 1.62723 1.99308 3.32084
8 4.48823 3.00212 3.72774 7.06577

batch size

translations/second

p90

p95

p99

1 1.12664 1.56816 1.954 2.88826
2 1.42221 2.47662 3.0681 4.58871
4 1.62179 4.80462 5.60864 9.08026
8 1.62194 8.72006 10.3927 17.1011

batch size

translations/second

p90

p95

p99

1 1.06394 1.73569 2.17 3.39102
2 1.22597 2.68251 3.63989 5.52069
4 1.30998 5.34685 6.70295 9.5359
8 1.22196 11.3644 16.8053 21.5743

batch size

translations/second

p90

p95

p99

1 1.66491 1.01757 1.18916 1.58394
2 2.28272 1.40361 1.64927 2.26426
4 2.75788 2.21632 2.4773 4.85959
8 2.83518 3.83715 4.32809 9.66428

batch size

translations/second

p90

p95

p99

1 2.46888 0.745367 0.897639 1.26768
2 3.33421 1.01969 1.17779 2.13493
4 4.0172 1.54522 1.91358 3.47273
8 4.20587 2.92903 4.75732 6.37416

GPU

NVIDIA DGX A100 40 GB
CPU
Model AMD EPYC 7742 64-Core Processor
Thread(s) per core 2
Socket(s) 2
Core(s) per socket 64
NUMA node(s) 8
Frequency boost enabled
CPU max MHz 2250
CPU min MHz 1500
RAM
Model Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz
Configured Memory Speed 2933 MT/s
RAM Size 32x64GB (2048GB Total)

GPU

NVIDIA H100 80GB HBM3
CPU
Model Intel(R) Xeon(R) Platinum 8480CL
Thread(s) per core 2
Socket(s) 2
Core(s) per socket 56
NUMA node(s) 2
CPU max MHz 3800
CPU min MHz 800
RAM
Model Micron DDR5 MTC40F2046S1RC48BA1 4800MHz
Configured Memory Speed 4400 MT/s
RAM Size 32x64GB (2048GB Total)

GPU

NVIDIA L40
CPU
Model AMD EPYC 7763 64-Core Processor
Thread(s) per core 1
Socket(s) 2
Core(s) per socket 64
NUMA node(s) 8
Frequency boost enabled
CPU max MHz 3529
CPU min MHz 1500
RAM
Model Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed 3200 MT/s
RAM Size 16x32GB (512GB Total)

Previous Configuration
Next Support Matrix
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 6, 2024.