Performance#

Evaluation Process#

This section shows the latency and throughput numbers for the Riva NMT service on different GPUs.

These numbers were captured after the preconfigured NMT pipelines were deployed from our Quick Start scripts.

The command used to measure performance was:

riva_nmt_t2t_client
  --riva_uri=0.0.0.0:50051
  --model_name=<model name>
  --batch_size=<batch size>
  --target_language_code=<target language code>
  --source_language_code=<source language code>
  --text_file=<wmt_filename>

The riva_nmt_t2t_client returns the following latency measurements:

  • latency: the overall latency of all returned responses. This is what is tabulated in the tables below.

Results#

Latencies and throughput measurements are reported in the following tables. Throughput is measured in sentences translated per second.

For specifications of the hardware on which these measurements were collected, refer to the Hardware Specifications section.

batch size

translations/second

p90

p95

p99

1

4.98292

0.355278

0.440269

0.609254

2

6.77264

0.501672

0.594919

0.816265

4

8.77044

0.779006

0.887223

1.41906

8

10.3628

1.26556

1.56648

2.59009

Hardware Specifications#

GPU

NVIDIA DGX A100 40 GB

CPU

Model

AMD EPYC 7742 64-Core Processor

Thread(s) per core

2

Socket(s)

2

Core(s) per socket

64

NUMA node(s)

8

Frequency boost

enabled

CPU max MHz

2250

CPU min MHz

1500

RAM

Model

Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz

Configured Memory Speed

2933 MT/s

RAM Size

32x64GB (2048GB Total)