Performance#

Evaluation Process#

This section shows the latency and throughput numbers for the Riva NMT service on different GPUs.

The following command was used to measure performance:

riva_nmt_t2t_client
  --riva_uri=0.0.0.0:50051
  --model_name=megatronnmt_any_any_1b
  --batch_size=<batch size>
  --target_language_code=<target language code>
  --source_language_code=<source language code>
  --text_file=<wmt_filename>

The riva_nmt_t2t_client returns the following latency measurements:

  • latency: the overall latency of all returned responses. This is what is tabulated in the following tables.

You can get the source code for the riva_nmt_t2t_client at Riva C++ Clients.

Results#

The following tables show the latencies and throughput measurements. Throughput is measured in sentences translated per second.

For information about the hardware that collected these measurements, see the Hardware Specifications section.

Riva Translate 1.6b#

batch size

tokens/second

p90

p95

p99

1

26.2867

2.03927

2.50127

3.38444

2

37.9656

2.66554

3.13718

4.35343

4

52.68

3.90231

4.27135

6.70392

8

60.0354

6.66561

7.94382

12.7633

batch size

tokens/second

p90

p95

p99

1

31.9811

2.33799

2.66042

3.81389

2

42.7697

2.92146

3.48803

4.87088

4

55.6321

4.42112

5.65174

7.68267

8

63.8819

7.67295

9.85235

12.8664

batch size

tokens/second

p90

p95

p99

1

26.3775

1.34108

1.54733

1.99489

2

38.9406

1.6354

1.87501

2.2296

4

56.8026

2.16781

2.43267

2.74134

8

74.5635

3.30279

3.59166

4.28577

batch size

tokens/second

p90

p95

p99

1

37.3683

1.04883

1.2315

1.5244

2

56.919

1.27984

1.44713

1.80017

4

84.6404

1.62568

1.8017

2.21461

8

116.29

2.32152

2.69288

3.06221

batch size

tokens/second

p90

p95

p99

1

26.407

1.39359

1.59803

2.0832

2

38.9544

1.70536

1.89963

2.53018

4

57.3626

2.17563

2.41663

4.14745

8

74.4845

3.24898

3.77144

7.65627

batch size

tokens/second

p90

p95

p99

1

25.9981

0.996788

1.16798

1.65174

2

37.7592

1.23002

1.43918

2.28275

4

55.8361

1.59303

1.92639

2.96886

8

72.8569

2.47273

3.09559

4.60961

batch size

tokens/second

p90

p95

p99

1

25.0178

2.16171

2.67937

3.57496

2

35.7852

2.8741

3.43384

4.825

4

45.7635

4.75694

5.46036

8.43625

8

53.6928

7.49429

9.33626

15.9536

batch size

tokens/second

p90

p95

p99

1

30.4228

2.45499

2.8447

4.16917

2

39.371

3.23598

3.9612

5.80284

4

47.5513

5.20845

6.5702

9.05081

8

53.4489

9.45831

12.3315

16.7165

batch size

tokens/second

p90

p95

p99

1

25.7329

1.4008

1.59731

2.02982

2

37.3419

1.70973

1.97817

2.38832

4

52.8377

2.39732

2.73125

3.28759

8

62.3088

3.89276

4.19171

5.01999

batch size

tokens/second

p90

p95

p99

1

37.6181

1.06453

1.2618

1.56666

2

54.3727

1.34588

1.54159

1.90145

4

77.2692

1.85304

2.06099

2.62032

8

100.463

2.67849

3.19095

3.68078

batch size

tokens/second

p90

p95

p99

1

25.6851

1.45086

1.65187

2.14367

2

37.6205

1.77873

2.00517

2.71895

4

52.6492

2.39175

2.70523

4.71649

8

62.1082

3.8764

4.39479

9.33342

batch size

tokens/second

p90

p95

p99

1

26.3125

1.00717

1.19257

1.67369

2

36.9222

1.2762

1.478

2.46823

4

51.398

1.83249

2.24098

3.33602

8

63.8143

2.85154

3.58795

5.41236

batch size

tokens/second

p90

p95

p99

1

24.8964

2.12924

2.65001

3.58206

2

34.9882

2.98058

3.54244

5.17292

4

44.407

4.79422

5.67429

8.66048

8

52.0275

7.6554

9.58992

15.6379

batch size

tokens/second

p90

p95

p99

1

29.9773

2.48201

2.85844

4.27288

2

38.6565

3.3019

4.04395

6.12123

4

45.7503

5.44284

7.03912

9.69829

8

51.1056

9.83193

12.5469

16.8656

batch size

tokens/second

p90

p95

p99

1

25.3765

1.38744

1.56635

2.01276

2

37.2191

1.7271

2.00251

2.42118

4

51.1486

2.51382

2.8324

3.51729

8

59.8422

4.07464

4.40873

5.24835

batch size

tokens/second

p90

p95

p99

1

36.2303

1.0951

1.27495

1.57815

2

53.9788

1.35253

1.54013

1.93965

4

76.3488

1.85225

2.07805

2.64824

8

96.3217

2.80936

3.31854

3.90351

batch size

tokens/second

p90

p95

p99

1

25.2758

1.44721

1.65356

2.15285

2

37.604

1.77375

2.01242

2.72198

4

51.4024

2.49608

2.79586

5.08613

8

59.5006

4.04908

4.63766

9.81065

batch size

tokens/second

p90

p95

p99

1

25.2419

1.03367

1.22135

1.71075

2

36.4417

1.27991

1.50347

2.44608

4

50.6761

1.8483

2.28897

3.48173

8

61.1923

2.97418

3.84112

5.69605

batch size

tokens/second

p90

p95

p99

1

23.864

2.21488

2.74259

3.76037

2

33.5527

3.10486

3.74949

5.343

4

42.1649

5.07062

5.93531

9.05

8

47.4944

8.55144

10.6804

18.5123

batch size

tokens/second

p90

p95

p99

1

28.6128

2.60506

3.0003

4.4652

2

36.6462

3.51057

4.29037

6.38079

4

43.646

5.66035

7.43962

10.2525

8

46.5824

10.8355

14.4172

19.4719

batch size

tokens/second

p90

p95

p99

1

24.1934

1.4668

1.6551

2.11026

2

35.5399

1.81347

2.08718

2.53916

4

47.9159

2.66117

2.97279

3.50095

8

57.1626

4.3184

4.66154

5.64512

batch size

tokens/second

p90

p95

p99

1

34.9025

1.13927

1.33301

1.64506

2

51.477

1.42015

1.62161

2.02865

4

71.5458

2.03029

2.23385

2.80834

8

91.6256

3.00349

3.51311

4.12845

batch size

tokens/second

p90

p95

p99

1

24.0672

1.5234

1.74705

2.25971

2

35.6375

1.8885

2.14569

2.97039

4

47.8865

2.65449

2.95025

5.46151

8

56.8241

4.30345

4.90771

10.5702

batch size

tokens/second

p90

p95

p99

1

24.3206

1.07742

1.26691

1.793

2

34.7435

1.35411

1.58727

2.62002

4

48.3788

1.91256

2.35622

3.66184

8

57.9614

3.2078

4.05819

6.1864

batch size

tokens/second

p90

p95

p99

1

40.5707

1.30604

1.5894

2.16364

2

56.9553

1.79082

2.16168

3.29733

4

66.8099

3.26993

3.64887

6.21649

8

62.5757

6.792

8.24721

13.8332

batch size

tokens/second

p90

p95

p99

1

49.4655

1.48434

1.73773

2.54071

2

61.9468

2.11053

2.53549

3.89271

4

66.7364

4.09326

4.85237

7.36323

8

61.9911

8.39187

11.1865

14.189

batch size

tokens/second

p90

p95

p99

1

40.4369

0.867933

0.983312

1.27003

2

58.9499

1.08901

1.2567

1.5336

4

77.0764

1.69687

1.93364

2.22511

8

79.8035

3.2676

3.61598

4.42012

batch size

tokens/second

p90

p95

p99

1

58.1328

0.671243

0.787934

0.980829

2

87.3839

0.83912

0.962738

1.19282

4

118.819

1.23386

1.37922

1.73483

8

133.943

2.14844

2.5295

2.95827

batch size

tokens/second

p90

p95

p99

1

40.4461

0.897417

1.02884

1.32677

2

59.6298

1.12109

1.25972

1.7518

4

77.4412

1.68057

1.88498

3.61804

8

79.0987

3.16789

3.72364

8.32152

batch size

tokens/second

p90

p95

p99

1

40.7501

0.625767

0.745007

1.02009

2

58.8772

0.795955

0.932173

1.55014

4

79.4899

1.19318

1.46917

2.35335

8

83.196

2.30013

2.96352

4.70573

batch size

tokens/second

p90

p95

p99

1

14.9059

4.18064

5.55622

8.90397

2

13.6982

8.77877

11.1217

18.41

4

11.9895

20.2857

23.3168

42.145

8

9.84107

45.3629

55.302

99.5216

batch size

tokens/second

p90

p95

p99

1

15.731

5.57288

6.65737

11.7328

2

13.4258

10.9427

14.2988

21.9515

4

11.1191

24.9354

34.4626

52.6908

8

9.31464

56.1604

76.3938

104.507

batch size

tokens/second

p90

p95

p99

1

17.9065

2.14465

2.66442

3.96079

2

17.6863

4.13346

5.24532

6.7035

4

16.281

8.86584

10.5843

12.674

8

13.819

19.8851

21.9904

27.6311

batch size

tokens/second

p90

p95

p99

1

26.7904

1.63459

2.00438

2.7469

2

27.7218

3.06423

3.66466

5.05088

4

25.8234

6.25622

7.28332

9.68234

8

22.6921

13.3635

15.8901

18.9904

batch size

tokens/second

p90

p95

p99

1

17.3311

2.3286

2.87607

4.3078

2

17.2567

4.36889

5.3378

7.67524

4

15.9637

8.94096

10.3726

23.2293

8

13.4777

19.4014

23.5091

56.2237

batch size

tokens/second

p90

p95

p99

1

18.8599

1.44669

1.87181

3.1101

2

19.022

2.87861

3.5337

7.27692

4

17.3604

5.93223

7.70771

14.4663

8

14.1413

14.1156

18.7657

30.8777

batch size

tokens/second

p90

p95

p99

1

24.8926

2.14594

2.67499

3.60681

2

34.6774

2.99878

3.67241

5.14107

4

43.0674

5.10826

5.87656

9.39341

8

47.3776

8.54803

10.6197

17.5638

batch size

tokens/second

p90

p95

p99

1

29.8693

2.51427

2.89999

4.30795

2

38.044

3.39933

4.15936

6.2249

4

43.6223

5.74422

7.59899

10.6489

8

45.0687

11.1592

14.7241

20.0668

batch size

tokens/second

p90

p95

p99

1

25.3254

1.40709

1.59688

2.0528

2

36.817

1.74851

2.02427

2.45473

4

50.5628

2.54396

2.91978

3.61169

8

56.0985

4.47452

4.82638

5.72361

batch size

tokens/second

p90

p95

p99

1

36.8149

1.08212

1.27069

1.57721

2

53.7507

1.37057

1.56666

1.94615

4

74.8121

1.90943

2.13149

2.75404

8

91.2649

3.03928

3.61824

4.19196

batch size

tokens/second

p90

p95

p99

1

25.3073

1.45916

1.66434

2.1714

2

37.2553

1.80097

2.03974

2.80161

4

50.3437

2.58562

2.94337

5.41708

8

56.0202

4.39519

5.0318

10.5843

batch size

tokens/second

p90

p95

p99

1

25.6837

1.01961

1.20561

1.69398

2

36.3235

1.30864

1.53248

2.54312

4

49.6179

1.88027

2.28496

3.55785

8

58.6442

3.15507

4.11201

6.14629

batch size

tokens/second

p90

p95

p99

1

34.0965

1.58011

1.96628

2.72448

2

46.4873

2.26697

2.71925

3.91062

4

55.8697

4.02463

4.48315

7.21816

8

62.8854

6.56656

8.19178

12.6974

batch size

tokens/second

p90

p95

p99

1

40.779

1.86284

2.15408

3.22027

2

50.1581

2.63038

3.16101

4.68046

4

57.2902

4.3511

5.57995

7.81723

8

62.8918

8.30041

10.1785

14.5662

batch size

tokens/second

p90

p95

p99

1

35.271

1.02293

1.16197

1.49121

2

49.5213

1.31475

1.52514

1.86395

4

66.6292

1.9028

2.17766

2.81436

8

73.0008

3.37777

3.61952

4.34075

batch size

tokens/second

p90

p95

p99

1

51.3997

0.784364

0.933197

1.1595

2

72.977

1.0184

1.17174

1.46658

4

97.3946

1.48676

1.64859

2.09218

8

120.295

2.30302

2.62656

3.18205

batch size

tokens/second

p90

p95

p99

1

35.1238

1.0647

1.21599

1.60668

2

49.6636

1.37161

1.55412

2.12787

4

66.8252

1.90341

2.14164

3.96884

8

72.9761

3.32665

3.78859

8.20151

batch size

tokens/second

p90

p95

p99

1

35.956

0.73739

0.878487

1.25711

2

49.3781

0.963772

1.14114

1.93997

4

65.3594

1.4497

1.79878

2.6903

8

78.4195

2.40052

2.92004

4.53465

Hardware Specifications#

GPU

NVIDIA DGX A100 40GB

CPU

Model

AMD EPYC 7742 64-Core Processor

Thread(s) per core

2

Socket(s)

2

Core(s) per socket

64

NUMA node(s)

8

Frequency boost

enabled

CPU max MHz

2250

CPU min MHz

1500

RAM

Model

Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz

Configured Memory Speed

2933 MT/s

RAM Size

32x64GB (2048GB Total)

GPU

NVIDIA H100 80GB HBM3

CPU

Model

Intel(R) Xeon(R) Platinum 8480CL

Thread(s) per core

2

Socket(s)

2

Core(s) per socket

56

NUMA node(s)

2

CPU max MHz

3800

CPU min MHz

800

RAM

Model

Micron DDR5 MTC40F2046S1RC48BA1 4800MHz

Configured Memory Speed

4400 MT/s

RAM Size

32x64GB (2048GB Total)

GPU

NVIDIA L40

CPU

Model

AMD EPYC 7763 64-Core Processor

Thread(s) per core

1

Socket(s)

2

Core(s) per socket

64

NUMA node(s)

8

Frequency boost

enabled

CPU max MHz

3529

CPU min MHz

1500

RAM

Model

Samsung DDR4 M393A4K40DB3-CWE 3200MHz

Configured Memory Speed

3200 MT/s

RAM Size

16x32GB (512GB Total)

Model Accuracy#

Riva NMT models are evaluated using the BLEU (Bilingual Evaluation Understudy) score, an industry-standard metric for evaluating machine translation quality.

BLEU scores range from 0 to 100, where higher scores indicate better translation quality. The score measures how similar the machine translation output is to one or more reference human translations by:

  1. Comparing n-gram matches between the machine translation and reference translations

  2. Applying penalties for translations that are too short or too long

  3. Combining these components into a final score

Riva Translate 1.6b#

The table below shows the BLEU scores for the Riva Translate 1.6B any2any model, which supports translation between any pair of the following languages: de, es-ES, es-US, fr, ja, ru, zh-CN. The scores are based on the Flores-101 dataset. In the table, each row corresponds the source language, and each column corresponds to the target language. Higher BLEU scores indicate better translation quality.

Source/Target

de

es-ES

es-US

fr

ja

ru

zh-CN

de

-

24.5

24.1

39.3

27.3

26.1

33.3

es-ES

22.1

-

-

30.3

23.5

20.2

29.8

es-US

22.1

-

-

30.3

23.5

20.2

29.8

fr

25.0

24.8

30.4

-

26.6

25.5

32.7

ja

16.9

16.4

18.1

23.7

-

15.2

28.9

ru

22.4

21.9

26.4

33.4

25.4

-

30.9

zh-CN

17.5

17.3

19.1

25.6

16.8

23.7

-

The table below shows BLEU scores of RIVA NMT Megatron 1.6B any2any model for translation between English and various target languages for Flores-101 dataset.

Language

English to Target ⬆️

Target to English ⬆️

Arabic

28.0

40.6

Brazilian Portuguese

49.8

50.5

Bulgarian

41.8

42.1

Croatian

27.9

37.8

Czech

32.9

41.1

Danish

46.2

49.6

Dutch

26.7

32.6

Estonian

27.3

38.9

European Portuguese

48.1

50.5

European Spanish

27.6

30.7

Finnish

22.7

35.0

French

50.5

46.5

German

38.2

45.2

Greek

27.5

36.5

Hindi

33.5

39.9

Hungarian

26.7

36.9

Indonesian

47.2

44.9

Italian

29.9

34.5

Japanese

32.5

26.7

Korean

28.0

29.5

Latin American Spanish

26.8

30.7

Latvian

31.0

37.0

Lithuanian

27.5

35.1

Norwegian

34.0

44.8

Polish

20.8

30.3

Romanian

40.7

45.0

Russian

31.3

36.1

Simplified Chinese

39.5

28.5

Slovak

35.0

40.6

Slovenian

30.7

36.2

Swedish

45.0

49.6

Thai

30.9

28.1

Traditional Chinese

30.8

26.8

Turkish

29.5

38.8

Ukrainian

30.7

40.2

Vietnamese

41.8

36.9