Performance
This section shows the latency and throughput numbers for the Riva NMT service on different GPUs.
The following command was used to measure performance:
riva_nmt_t2t_client
--riva_uri=0.0.0.0:50051
--model_name=<model name>
--batch_size=<batch size>
--target_language_code=<target language code>
--source_language_code=<source language code>
--text_file=<wmt_filename>
The riva_nmt_t2t_client
returns the following latency measurements:
latency
: the overall latency of all returned responses. This is what is tabulated in the following tables.
You can get the source code for the riva_nmt_t2t_client
at https://github.com/nvidia-riva/cpp-clients.
The following tables show the latencies and throughput measurements. Throughput is measured in sentences translated per second.
For information about the hardware that collected these measurements, see the Hardware Specifications section.
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.87372 | 0.920689 | 1.08201 | 1.47551 |
2 | 2.50597 | 1.2444 | 1.47108 | 1.96975 |
4 | 3.16918 | 1.92869 | 2.15762 | 4.26482 |
8 | 3.48258 | 3.28437 | 3.8072 | 8.47366 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.87276 | 0.627917 | 0.780388 | 1.15155 |
2 | 3.66135 | 0.948616 | 1.07478 | 1.89837 |
4 | 4.66885 | 1.30849 | 1.72828 | 2.79435 |
8 | 5.04019 | 2.59703 | 3.491 | 5.09752 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.5204 | 0.699321 | 0.909914 | 1.29494 |
2 | 3.30693 | 1.08113 | 1.35028 | 1.89888 |
4 | 4.22761 | 1.74112 | 1.97779 | 3.14655 |
8 | 4.6737 | 2.98084 | 3.74519 | 6.52791 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.97666 | 0.59925 | 0.735827 | 1.11948 |
2 | 3.87291 | 0.900658 | 1.13785 | 1.66429 |
4 | 4.92714 | 1.36958 | 1.72552 | 2.59247 |
8 | 5.56393 | 2.5402 | 3.18025 | 5.85779 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.32758 | 1.39331 | 1.73098 | 2.52108 |
2 | 1.67142 | 2.0546 | 2.61797 | 3.84186 |
4 | 1.95767 | 3.71373 | 4.21752 | 7.75958 |
8 | 2.01694 | 6.81574 | 8.6714 | 13.972 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.21664 | 1.58103 | 1.97206 | 3.05327 |
2 | 1.43854 | 2.24791 | 3.0426 | 4.65663 |
4 | 1.60382 | 4.31419 | 5.82964 | 8.14162 |
8 | 1.55748 | 9.50545 | 10.6703 | 14.5835 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.6397 | 1.09931 | 1.38108 | 2.12866 |
2 | 2.09401 | 1.6393 | 2.02871 | 3.4556 |
4 | 2.49846 | 2.77695 | 3.43139 | 7.73349 |
8 | 2.60394 | 5.28244 | 6.63392 | 14.233 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.60941 | 1.1678 | 1.51842 | 2.36442 |
2 | 1.97409 | 1.80278 | 2.24439 | 3.83375 |
4 | 2.30099 | 3.12782 | 3.9447 | 6.95936 |
8 | 2.29916 | 6.13487 | 9.22069 | 14.0041 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.97205 | 0.886273 | 1.04213 | 1.43199 |
2 | 2.61224 | 1.22411 | 1.4301 | 1.80279 |
4 | 3.34587 | 1.88818 | 2.24705 | 2.78781 |
8 | 3.67947 | 3.46478 | 3.76565 | 4.85861 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.56872 | 0.681028 | 0.804774 | 1.07597 |
2 | 3.3472 | 0.989105 | 1.15346 | 1.50089 |
4 | 4.38947 | 1.42871 | 1.63829 | 2.12449 |
8 | 5.12472 | 2.40464 | 2.78384 | 3.43832 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 3.6477 | 0.495766 | 0.618124 | 0.899573 |
2 | 4.73131 | 0.719162 | 0.829255 | 1.54409 |
4 | 5.72285 | 1.12675 | 1.48843 | 2.42258 |
8 | 5.54803 | 2.40075 | 3.26125 | 4.88046 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.566 | 0.671063 | 0.781803 | 1.05472 |
2 | 3.40786 | 0.938218 | 1.09553 | 1.49499 |
4 | 3.8768 | 1.61876 | 1.8546 | 3.78943 |
8 | 3.66088 | 3.16267 | 3.70907 | 8.37703 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 3.74805 | 0.484533 | 0.593134 | 0.902163 |
2 | 4.94672 | 0.699463 | 0.881004 | 1.29322 |
4 | 6.05463 | 1.1039 | 1.46067 | 2.35874 |
8 | 5.71557 | 2.52177 | 3.117 | 6.16315 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 3.3929 | 0.528577 | 0.689865 | 0.974162 |
2 | 4.40994 | 0.819896 | 1.01791 | 1.47063 |
4 | 5.17404 | 1.4859 | 1.70887 | 2.8181 |
8 | 4.94583 | 2.9022 | 3.65776 | 6.59687 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.56847 | 1.1745 | 1.48006 | 2.41088 |
2 | 1.80375 | 1.83741 | 2.5072 | 3.9312 |
4 | 1.79482 | 3.99013 | 5.43572 | 7.66713 |
8 | 1.57175 | 9.60344 | 10.6467 | 14.2069 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.66571 | 1.08862 | 1.35468 | 1.977 |
2 | 2.02682 | 1.70984 | 2.18956 | 3.40245 |
4 | 2.06979 | 3.57831 | 4.18346 | 7.88476 |
8 | 1.87831 | 7.54048 | 9.5857 | 14.8807 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.07451 | 0.910784 | 1.17259 | 1.84924 |
2 | 2.47588 | 1.43357 | 1.82757 | 3.37628 |
4 | 2.52347 | 2.94212 | 3.79328 | 6.77849 |
8 | 2.22887 | 6.50901 | 9.92396 | 14.814 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.13625 | 0.842487 | 1.04341 | 1.60627 |
2 | 2.69967 | 1.26584 | 1.62549 | 2.91262 |
4 | 2.88421 | 2.49964 | 3.07377 | 7.25401 |
8 | 2.66523 | 5.27891 | 6.75387 | 14.1079 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 3.51151 | 0.500012 | 0.583887 | 0.772306 |
2 | 4.74482 | 0.684268 | 0.818367 | 1.06536 |
4 | 5.91097 | 1.07181 | 1.25083 | 1.69175 |
8 | 5.86418 | 2.10006 | 2.4796 | 3.11607 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.56824 | 0.684774 | 0.79945 | 1.07777 |
2 | 3.39711 | 0.957159 | 1.11632 | 1.42855 |
4 | 4.01128 | 1.64192 | 1.96622 | 2.46766 |
8 | 3.90422 | 3.26752 | 3.5553 | 4.68024 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.4291 | 1.24761 | 1.50121 | 2.3264 |
2 | 1.86382 | 1.89852 | 2.34146 | 3.97016 |
4 | 2.11519 | 3.38856 | 4.20307 | 8.6753 |
8 | 2.13149 | 6.45051 | 8.98936 | 16.7489 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.41881 | 1.29899 | 1.64733 | 2.58565 |
2 | 1.77199 | 2.07364 | 2.5714 | 4.50892 |
4 | 1.96346 | 3.56488 | 5.05489 | 7.79614 |
8 | 1.91313 | 8.32652 | 10.8768 | 15.854 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.74946 | 0.988257 | 1.14519 | 1.53376 |
2 | 2.39319 | 1.35765 | 1.58754 | 2.03805 |
4 | 2.90098 | 2.20014 | 2.55023 | 3.045 |
8 | 3.04578 | 3.93638 | 4.21325 | 5.37325 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.26458 | 0.777135 | 0.894938 | 1.16347 |
2 | 3.15778 | 1.02839 | 1.21504 | 1.61981 |
4 | 3.86324 | 1.61475 | 1.83013 | 2.4488 |
8 | 4.41485 | 2.63231 | 3.25951 | 4.81927 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.20713 | 0.811372 | 1.03459 | 1.42627 |
2 | 2.98033 | 1.19901 | 1.51812 | 2.2026 |
4 | 3.58356 | 2.07059 | 2.35678 | 3.65251 |
8 | 3.81516 | 3.54866 | 4.34197 | 7.75236 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.52089 | 0.722309 | 0.869916 | 1.28357 |
2 | 3.42075 | 1.00453 | 1.26259 | 1.91771 |
4 | 4.16782 | 1.62723 | 1.99308 | 3.32084 |
8 | 4.48823 | 3.00212 | 3.72774 | 7.06577 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.12664 | 1.56816 | 1.954 | 2.88826 |
2 | 1.42221 | 2.47662 | 3.0681 | 4.58871 |
4 | 1.62179 | 4.80462 | 5.60864 | 9.08026 |
8 | 1.62194 | 8.72006 | 10.3927 | 17.1011 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.06394 | 1.73569 | 2.17 | 3.39102 |
2 | 1.22597 | 2.68251 | 3.63989 | 5.52069 |
4 | 1.30998 | 5.34685 | 6.70295 | 9.5359 |
8 | 1.22196 | 11.3644 | 16.8053 | 21.5743 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 1.66491 | 1.01757 | 1.18916 | 1.58394 |
2 | 2.28272 | 1.40361 | 1.64927 | 2.26426 |
4 | 2.75788 | 2.21632 | 2.4773 | 4.85959 |
8 | 2.83518 | 3.83715 | 4.32809 | 9.66428 |
batch size |
translations/second |
p90 |
p95 |
p99 |
---|---|---|---|---|
1 | 2.46888 | 0.745367 | 0.897639 | 1.26768 |
2 | 3.33421 | 1.01969 | 1.17779 | 2.13493 |
4 | 4.0172 | 1.54522 | 1.91358 | 3.47273 |
8 | 4.20587 | 2.92903 | 4.75732 | 6.37416 |
GPU |
|
---|---|
NVIDIA DGX A100 40 GB | |
CPU | |
Model | AMD EPYC 7742 64-Core Processor |
Thread(s) per core | 2 |
Socket(s) | 2 |
Core(s) per socket | 64 |
NUMA node(s) | 8 |
Frequency boost | enabled |
CPU max MHz | 2250 |
CPU min MHz | 1500 |
RAM | |
Model | Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz |
Configured Memory Speed | 2933 MT/s |
RAM Size | 32x64GB (2048GB Total) |
GPU |
|
---|---|
NVIDIA H100 80GB HBM3 | |
CPU | |
Model | Intel(R) Xeon(R) Platinum 8480CL |
Thread(s) per core | 2 |
Socket(s) | 2 |
Core(s) per socket | 56 |
NUMA node(s) | 2 |
CPU max MHz | 3800 |
CPU min MHz | 800 |
RAM | |
Model | Micron DDR5 MTC40F2046S1RC48BA1 4800MHz |
Configured Memory Speed | 4400 MT/s |
RAM Size | 32x64GB (2048GB Total) |
GPU |
|
---|---|
NVIDIA L40 | |
CPU | |
Model | AMD EPYC 7763 64-Core Processor |
Thread(s) per core | 1 |
Socket(s) | 2 |
Core(s) per socket | 64 |
NUMA node(s) | 8 |
Frequency boost | enabled |
CPU max MHz | 3529 |
CPU min MHz | 1500 |
RAM | |
Model | Samsung DDR4 M393A4K40DB3-CWE 3200MHz |
Configured Memory Speed | 3200 MT/s |
RAM Size | 16x32GB (512GB Total) |