Performance#

Evaluation Process#

This section shows the latency and throughput numbers for the Riva NMT service on different GPUs.

The following command was used to measure performance:

riva_nmt_t2t_client
  --riva_uri=0.0.0.0:50051
  --model_name=megatronnmt_any_any_1b
  --batch_size=<batch size>
  --target_language_code=<target language code>
  --source_language_code=<source language code>
  --text_file=<wmt_filename>

The riva_nmt_t2t_client returns the following latency measurements:

  • latency: the overall latency of all returned responses. This is what is tabulated in the following tables.

You can get the source code for the riva_nmt_t2t_client at Riva C++ Clients.

Results#

The following tables show the latencies and throughput measurements. Throughput is measured in sentences translated per second.

For information about the hardware that collected these measurements, see the Hardware Specifications section.

batch size

translations/second

p90

p95

p99

1

2.5204

0.699321

0.909914

1.29494

2

3.30693

1.08113

1.35028

1.89888

4

4.22761

1.74112

1.97779

3.14655

8

4.6737

2.98084

3.74519

6.52791

batch size

translations/second

p90

p95

p99

1

2.97666

0.59925

0.735827

1.11948

2

3.87291

0.900658

1.13785

1.66429

4

4.92714

1.36958

1.72552

2.59247

8

5.56393

2.5402

3.18025

5.85779

batch size

translations/second

p90

p95

p99

1

1.32758

1.39331

1.73098

2.52108

2

1.67142

2.0546

2.61797

3.84186

4

1.95767

3.71373

4.21752

7.75958

8

2.01694

6.81574

8.6714

13.972

batch size

translations/second

p90

p95

p99

1

1.21664

1.58103

1.97206

3.05327

2

1.43854

2.24791

3.0426

4.65663

4

1.60382

4.31419

5.82964

8.14162

8

1.55748

9.50545

10.6703

14.5835

batch size

translations/second

p90

p95

p99

1

1.6397

1.09931

1.38108

2.12866

2

2.09401

1.6393

2.02871

3.4556

4

2.49846

2.77695

3.43139

7.73349

8

2.60394

5.28244

6.63392

14.233

batch size

translations/second

p90

p95

p99

1

1.60941

1.1678

1.51842

2.36442

2

1.97409

1.80278

2.24439

3.83375

4

2.30099

3.12782

3.9447

6.95936

8

2.29916

6.13487

9.22069

14.0041

batch size

translations/second

p90

p95

p99

1

1.97205

0.886273

1.04213

1.43199

2

2.61224

1.22411

1.4301

1.80279

4

3.34587

1.88818

2.24705

2.78781

8

3.67947

3.46478

3.76565

4.85861

batch size

translations/second

p90

p95

p99

1

2.56872

0.681028

0.804774

1.07597

2

3.3472

0.989105

1.15346

1.50089

4

4.38947

1.42871

1.63829

2.12449

8

5.12472

2.40464

2.78384

3.43832

batch size

translations/second

p90

p95

p99

1

1.87372

0.920689

1.08201

1.47551

2

2.50597

1.2444

1.47108

1.96975

4

3.16918

1.92869

2.15762

4.26482

8

3.48258

3.28437

3.8072

8.47366

batch size

translations/second

p90

p95

p99

1

2.87276

0.627917

0.780388

1.15155

2

3.66135

0.948616

1.07478

1.89837

4

4.66885

1.30849

1.72828

2.79435

8

5.04019

2.59703

3.491

5.09752

batch size

translations/second

p90

p95

p99

1

3.3929

0.528577

0.689865

0.974162

2

4.40994

0.819896

1.01791

1.47063

4

5.17404

1.4859

1.70887

2.8181

8

4.94583

2.9022

3.65776

6.59687

batch size

translations/second

p90

p95

p99

1

3.74805

0.484533

0.593134

0.902163

2

4.94672

0.699463

0.881004

1.29322

4

6.05463

1.1039

1.46067

2.35874

8

5.71557

2.52177

3.117

6.16315

batch size

translations/second

p90

p95

p99

1

1.66571

1.08862

1.35468

1.977

2

2.02682

1.70984

2.18956

3.40245

4

2.06979

3.57831

4.18346

7.88476

8

1.87831

7.54048

9.5857

14.8807

batch size

translations/second

p90

p95

p99

1

1.56847

1.1745

1.48006

2.41088

2

1.80375

1.83741

2.5072

3.9312

4

1.79482

3.99013

5.43572

7.66713

8

1.57175

9.60344

10.6467

14.2069

batch size

translations/second

p90

p95

p99

1

2.13625

0.842487

1.04341

1.60627

2

2.69967

1.26584

1.62549

2.91262

4

2.88421

2.49964

3.07377

7.25401

8

2.66523

5.27891

6.75387

14.1079

batch size

translations/second

p90

p95

p99

1

2.07451

0.910784

1.17259

1.84924

2

2.47588

1.43357

1.82757

3.37628

4

2.52347

2.94212

3.79328

6.77849

8

2.22887

6.50901

9.92396

14.814

batch size

translations/second

p90

p95

p99

1

2.56824

0.684774

0.79945

1.07777

2

3.39711

0.957159

1.11632

1.42855

4

4.01128

1.64192

1.96622

2.46766

8

3.90422

3.26752

3.5553

4.68024

batch size

translations/second

p90

p95

p99

1

3.51151

0.500012

0.583887

0.772306

2

4.74482

0.684268

0.818367

1.06536

4

5.91097

1.07181

1.25083

1.69175

8

5.86418

2.10006

2.4796

3.11607

batch size

translations/second

p90

p95

p99

1

2.566

0.671063

0.781803

1.05472

2

3.40786

0.938218

1.09553

1.49499

4

3.8768

1.61876

1.8546

3.78943

8

3.66088

3.16267

3.70907

8.37703

batch size

translations/second

p90

p95

p99

1

3.6477

0.495766

0.618124

0.899573

2

4.73131

0.719162

0.829255

1.54409

4

5.72285

1.12675

1.48843

2.42258

8

5.54803

2.40075

3.26125

4.88046

batch size

translations/second

p90

p95

p99

1

2.20713

0.811372

1.03459

1.42627

2

2.98033

1.19901

1.51812

2.2026

4

3.58356

2.07059

2.35678

3.65251

8

3.81516

3.54866

4.34197

7.75236

batch size

translations/second

p90

p95

p99

1

2.52089

0.722309

0.869916

1.28357

2

3.42075

1.00453

1.26259

1.91771

4

4.16782

1.62723

1.99308

3.32084

8

4.48823

3.00212

3.72774

7.06577

batch size

translations/second

p90

p95

p99

1

1.12664

1.56816

1.954

2.88826

2

1.42221

2.47662

3.0681

4.58871

4

1.62179

4.80462

5.60864

9.08026

8

1.62194

8.72006

10.3927

17.1011

batch size

translations/second

p90

p95

p99

1

1.06394

1.73569

2.17

3.39102

2

1.22597

2.68251

3.63989

5.52069

4

1.30998

5.34685

6.70295

9.5359

8

1.22196

11.3644

16.8053

21.5743

batch size

translations/second

p90

p95

p99

1

1.4291

1.24761

1.50121

2.3264

2

1.86382

1.89852

2.34146

3.97016

4

2.11519

3.38856

4.20307

8.6753

8

2.13149

6.45051

8.98936

16.7489

batch size

translations/second

p90

p95

p99

1

1.41881

1.29899

1.64733

2.58565

2

1.77199

2.07364

2.5714

4.50892

4

1.96346

3.56488

5.05489

7.79614

8

1.91313

8.32652

10.8768

15.854

batch size

translations/second

p90

p95

p99

1

1.74946

0.988257

1.14519

1.53376

2

2.39319

1.35765

1.58754

2.03805

4

2.90098

2.20014

2.55023

3.045

8

3.04578

3.93638

4.21325

5.37325

batch size

translations/second

p90

p95

p99

1

2.26458

0.777135

0.894938

1.16347

2

3.15778

1.02839

1.21504

1.61981

4

3.86324

1.61475

1.83013

2.4488

8

4.41485

2.63231

3.25951

4.81927

batch size

translations/second

p90

p95

p99

1

1.66491

1.01757

1.18916

1.58394

2

2.28272

1.40361

1.64927

2.26426

4

2.75788

2.21632

2.4773

4.85959

8

2.83518

3.83715

4.32809

9.66428

batch size

translations/second

p90

p95

p99

1

2.46888

0.745367

0.897639

1.26768

2

3.33421

1.01969

1.17779

2.13493

4

4.0172

1.54522

1.91358

3.47273

8

4.20587

2.92903

4.75732

6.37416

Hardware Specifications#

GPU

NVIDIA DGX A100 40GB

CPU

Model

AMD EPYC 7742 64-Core Processor

Thread(s) per core

2

Socket(s)

2

Core(s) per socket

64

NUMA node(s)

8

Frequency boost

enabled

CPU max MHz

2250

CPU min MHz

1500

RAM

Model

Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz

Configured Memory Speed

2933 MT/s

RAM Size

32x64GB (2048GB Total)

GPU

NVIDIA H100 80GB HBM3

CPU

Model

Intel(R) Xeon(R) Platinum 8480CL

Thread(s) per core

2

Socket(s)

2

Core(s) per socket

56

NUMA node(s)

2

CPU max MHz

3800

CPU min MHz

800

RAM

Model

Micron DDR5 MTC40F2046S1RC48BA1 4800MHz

Configured Memory Speed

4400 MT/s

RAM Size

32x64GB (2048GB Total)

GPU

NVIDIA L40

CPU

Model

AMD EPYC 7763 64-Core Processor

Thread(s) per core

1

Socket(s)

2

Core(s) per socket

64

NUMA node(s)

8

Frequency boost

enabled

CPU max MHz

3529

CPU min MHz

1500

RAM

Model

Samsung DDR4 M393A4K40DB3-CWE 3200MHz

Configured Memory Speed

3200 MT/s

RAM Size

16x32GB (512GB Total)