Performance

Below are measured performance for the Riva ASR, NLP, and TTS services on NVIDIA T4, V100 SXM2 16 GB, and NVIDIA A100 SXM4 40 GB GPUs. CPU specifications for each system can be found here:

ASR

The latency numbers below were measured using the streaming recognition mode, with the BERT-based punctuation model enabled, a 4-gram language model, a decoder beam width of 128, and timestamps enabled. The Jasper, QuartzNet and Citrinet-1024 acoustic models were tested. The client and the server used audio chunks of the same duration (100ms, 160ms, 800ms, 3200ms depending on the server configuration). The Riva streaming client riva_streaming_asr_client, provided in the Riva client image, was used with the --simulate_realtime flag to simulate transcription from a microphone, where each stream was doing 5 iterations over a sample audio file from the Librispeech dataset (1272-135031-0000.wav). The command used was:

riva_streaming_asr_client  \
     --chunk_duration_ms=<chunk_duration> --simulate_realtime=true \
     --automatic_punctuation=true --num_parallel_requests=<num_streams> \
     --word_time_offsets=true --print_transcripts=false \
     --interim_results=false --num_iterations=<5*num_streams> \
     --audio_file=1272-135031-0000.wav --output_filename=/tmp/output.json

Note

There is one audio channel per stream. For example, to handle a stereo audio file with two channels, there will need to be two streams.

The riva_streaming_asr_client returns latency measured in three different ways after executing the benchmark task:

  • intermediate latency: latency to return an intermediate transcript with is_final == false

  • final latency: latency of messages return with is_final == true

  • latency: the overall latency of all returned message types

The overall latency numbers are reported below.

NVIDIA A100 GPU

Streaming, low-latency

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

160

9.7577

9.5604

9.9649

11.466

14.906

0.99969

citrinet

8

160

14.403

14.171

14.973

16.205

29.493

7.9947

citrinet

16

160

26.812

26.518

29.656

30.356

59.582

15.979

citrinet

32

160

41.707

41.789

43.589

45.316

98.952

31.923

citrinet

48

160

56.107

55.825

59.398

60.751

139.32

47.837

citrinet

64

160

59.71

58.399

66.993

69.52

161.63

63.734

citrinet

96

160

73.294

74.294

85.567

91.818

229.74

95.508

citrinet

128

160

91.074

90.655

102.74

107.51

292.04

127.08

jasper

1

100

13.531

13.183

15.061

17.599

21.585

0.99955

jasper

8

100

22.796

22.713

29.995

31.778

48.767

7.9914

jasper

16

100

31.498

29.847

40.571

44.482

59.163

15.979

jasper

32

100

41.884

41.578

50.799

54.911

79.555

31.942

jasper

48

100

46.696

46.577

57.675

63.062

90.114

47.89

jasper

64

100

54.044

54.195

66.216

71.833

112.54

63.83

jasper

96

100

71.604

72.763

90.908

96.76

182.88

95.631

jasper

128

100

98.472

93.921

120.42

132.16

385.83

127.43

quartznet

1

100

9.1328

8.6998

10.633

11.756

17.986

0.99955

quartznet

8

100

14.043

13.178

18.048

21.065

36.566

7.9927

quartznet

16

100

17.884

17.083

23.236

26.338

45.201

15.981

quartznet

32

100

25.94

25.778

32.443

36.645

65.529

31.946

quartznet

48

100

34.368

34.593

42.599

46.881

92.013

47.879

quartznet

64

100

41.66

41.664

50.617

54.864

114.01

63.808

quartznet

96

100

51.074

49.639

64.404

71.843

170.55

95.633

quartznet

128

100

55.536

53.139

74.938

84.083

186.51

127.46

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

800

10.267

9.9903

11.102

12.933

14.493

0.99973

citrinet

64

800

67.571

66.531

76.846

105.14

151.98

63.796

citrinet

128

800

97.206

100.76

122.88

147.72

222.68

127.41

citrinet

256

800

167.38

175.42

197.88

266.98

435.39

253.48

citrinet

384

800

238.29

251.99

291.62

372.06

613.68

379

citrinet

512

800

293.79

309.88

358.29

479.51

865.01

503.17

citrinet

768

800

436.27

439.45

520.01

727.25

2058.7

748.34

citrinet

1024

800

661.79

573.26

865.74

1552.2

4643.1

987.68

jasper

1

800

20.922

20.692

28.875

29.405

29.865

0.99955

jasper

64

800

84.316

82.369

117.57

134

163

63.803

jasper

128

800

119.35

119.19

158.91

198.17

235.79

127.41

jasper

256

800

173.3

169.53

241.81

307.34

372.86

253.97

jasper

384

800

235.77

230.55

352.95

445.74

544.94

379.37

jasper

512

800

286.03

281.09

430.92

592.05

725.74

504.25

jasper

768

800

422.54

376.28

664.59

1232.5

1454

750.46

jasper

1024

800

700.63

466.39

1740.5

2513.7

3490.7

988.89

quartznet

1

800

17.209

17.765

23.747

24.169

25.651

0.99958

quartznet

64

800

71.822

70.378

101.56

120.75

142.12

63.808

quartznet

128

800

95.425

92.052

137.83

172.7

219.42

127.44

quartznet

256

800

142.75

131.17

223.14

287.92

350.75

254.04

quartznet

384

800

184.19

169.87

284.12

374.8

468.78

380.15

quartznet

512

800

214.21

198.39

330.84

494.27

617.66

505.23

quartznet

768

800

294.87

257.71

486.65

761.83

1144.2

752.05

quartznet

1024

800

377.75

308.13

672.65

1197.6

1678.3

998.09

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

1600

11.581

11.076

13.587

14.2

16.066

0.99972

citrinet

256

1600

203.91

208.22

280.15

375.9

449.63

253.71

citrinet

512

1600

366.76

377.84

525.1

730.28

882.79

503.15

citrinet

768

1600

520.51

537.58

789.66

1073.8

1305.8

748.03

citrinet

1024

1600

680.4

696.13

1046.5

1421.1

2420.1

989.19

citrinet

1280

1600

809.97

762.33

1278.9

2497.8

2975.9

1226

citrinet

1512

1600

981.25

833.21

1525.6

2928.7

4528

1437

jasper

1

3200

35.778

37.577

40.806

40.839

40.839

0.9994

jasper

256

3200

370.35

371.36

486.26

506.86

531.23

253.55

jasper

512

3200

631.34

637.79

855.65

892.55

956.61

502.68

jasper

768

3200

993.58

1004

1437.5

1792.7

1997.5

744.72

jasper

1024

3200

1495.5

1481.1

2371.6

2474.2

2620.5

977.04

jasper

1280

3200

2028.4

2040.6

3173.1

4182.9

4434.1

1198.3

jasper

1512

3200

2544.2

2512.9

4790.7

5109.1

5445.4

1395.1

quartznet

1

3200

34.388

34.271

38.126

38.376

38.376

0.99941

quartznet

256

3200

267.27

260.58

376.31

396.37

434.8

254.11

quartznet

512

3200

457.53

445.7

683.32

715.81

757.45

504.45

quartznet

768

3200

637.18

620.67

972.11

1020.9

1094.7

751.29

quartznet

1024

3200

941.86

939.81

1549.6

1687.6

1860.7

988.58

quartznet

1280

3200

1290.3

1245.3

2183.7

2300

2462.1

1218.6

quartznet

1512

3200

1592.7

1590.9

2678.5

2861.7

3493.8

1421

NVIDIA A30

Streaming, low-latency

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

160

14.097

13.963

14.241

14.467

23.026

0.99952

citrinet

8

160

31.256

33.222

34.823

35.302

51.29

7.9904

citrinet

16

160

44.911

44.726

48.284

49.754

83.231

15.97

citrinet

32

160

63.817

64.056

69.934

71.409

141.25

31.902

citrinet

48

160

70.657

69.99

76.363

79.853

186.81

47.811

citrinet

64

160

85.287

84.761

93.108

100.53

240.1

63.674

citrinet

96

160

126.2

122.43

135.75

149.39

349.59

95.277

citrinet

128

160

177.57

161.16

208.79

302.54

515.74

126.76

jasper

1

100

15.144

14.742

16.153

18.412

25.529

0.99947

jasper

8

100

23.389

21.772

30.426

35.951

53.913

7.9893

jasper

16

100

40.269

39.476

45.645

49.016

73.797

15.974

jasper

32

100

50.508

49.153

58.838

62.989

91.88

31.936

jasper

48

100

61.582

60.531

70.907

76.604

114.88

47.879

jasper

64

100

70.52

72.573

85.199

91.798

176.93

63.796

jasper

96

100

139.76

119.76

169.73

203.13

667.9

95.558

jasper

128

100

2663.2

2734.4

3690.8

4282.6

5406.3

120.24

quartznet

1

100

10.224

9.7236

11.821

12.977

20.453

0.99948

quartznet

8

100

17.442

16.234

21.907

24.661

45.751

7.9914

quartznet

16

100

25.758

24.922

30.201

33.584

58.757

15.979

quartznet

32

100

34.549

33.219

41.661

46.362

89.622

31.928

quartznet

48

100

41.023

38.696

53.807

59.854

118.91

47.855

quartznet

64

100

46.091

44.091

58.148

66.58

152.35

63.798

quartznet

96

100

55.743

54.269

67.866

73.946

171.05

95.65

quartznet

128

100

65.466

62.553

79.624

90.737

255.93

127.42

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

800

15.273

14.994

15.455

19.696

19.949

0.99958

citrinet

64

800

109.02

107.38

128.03

151.81

202.02

63.715

citrinet

128

800

148.75

152.08

192.99

211.04

297.74

127.18

citrinet

256

800

262.4

283.86

317.88

356.92

560.89

252.86

citrinet

384

800

371.76

404.16

447.92

523.96

862.57

377.17

citrinet

512

800

494.45

506.77

581.76

775.86

1891.1

500.13

citrinet

768

800

2783.7

2135.2

4189.8

7105.5

15928

690.64

citrinet

1024

800

12500

11704

23256

25405

31698

690.07

jasper

1

800

22.319

22.159

31.174

32.097

33.375

0.99948

jasper

64

800

114.17

115.44

161.14

176.17

201.95

63.745

jasper

128

800

153.32

150.7

207.83

234.87

274.78

127.28

jasper

256

800

252.14

255.48

342.14

422.99

500.16

253.31

jasper

384

800

343

348.23

486.63

628.56

750.23

378.22

jasper

512

800

454.08

435.64

634.21

1209.4

1391.3

501.97

jasper

768

800

1147.9

630.28

3483.1

4374

6244.5

738.67

jasper

1024

800

7770.8

2115.2

24299

27926

39791

722.18

quartznet

1

800

19.7

20.492

26.528

26.984

28.295

0.99956

quartznet

64

800

90.109

88.12

136.98

155.58

186.51

63.745

quartznet

128

800

134.38

130.89

206.04

236.54

284.1

127.22

quartznet

256

800

177.91

167.48

256.5

336.28

417.29

253.86

quartznet

384

800

228.15

214.13

333.46

484.6

603.72

379.36

quartznet

512

800

293.71

274.95

437.49

633.69

959.8

503.66

quartznet

768

800

416.94

367.62

682.84

1227.2

1486.6

749.88

quartznet

1024

800

654.73

459.37

1488.7

2206.2

3094.9

986.78

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

1600

16.888

16.21

20.823

21.084

21.733

0.99957

citrinet

256

1600

321.49

343.74

433.47

527.6

611.02

252.89

citrinet

512

1600

574.1

618.1

787.82

973.16

1119.4

500.59

citrinet

768

1600

815.24

855.14

1169.4

1446.5

2576.5

742.88

citrinet

1024

1600

1166.1

1042

1577.1

3085.1

5150.6

979.48

citrinet

1280

1600

3802.6

1546.3

13707

25415

29243

1064.1

citrinet

1512

1600

8992.4

7448.2

18628

28306

37498

1107.8

jasper

1

3200

40.267

42.947

46.948

47.195

47.195

0.99932

jasper

256

3200

584.09

590.71

726.94

753.54

783.72

252.4

jasper

512

3200

1149.5

1115.7

1500.8

2114

2228.4

497.58

jasper

768

3200

2000.4

2040.5

2956.6

3123.5

3267.1

728.64

jasper

1024

3200

2981.7

2702.2

5284.5

5551.6

5768

947.66

jasper

1280

3200

11241

10417

22915

26534

30618

1018.4

jasper

1512

3200

18136

17164

39755

42884

45775

977.51

quartznet

1

3200

41.432

41.73

47.32

47.393

47.393

0.99922

quartznet

256

3200

389.99

387.1

525.14

551.07

588.59

253.36

quartznet

512

3200

696.41

691.1

973.09

1016.4

1057.1

502.43

quartznet

768

3200

1081.9

1059.1

1654.5

1940.8

2132.9

743.25

quartznet

1024

3200

1536.5

1562.8

2509.1

2619.1

2743.1

973.64

quartznet

1280

3200

2082.1

2110.5

3373.1

4299.3

4604.1

1195.5

quartznet

1512

3200

2712.4

2633.9

5057.5

5444

5799.6

1391.9

NVIDIA V100 GPU

Streaming, low-latency

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

160

11.155

11.046

11.288

11.41

17.872

0.99962

citrinet

8

160

19.291

19.115

20.081

20.55

38.068

7.9929

citrinet

16

160

34.927

34.655

37.423

38.275

67.863

15.97

citrinet

32

160

55.109

54.295

59.011

59.798

120

31.904

citrinet

48

160

71.089

70.934

75.116

76.805

176.51

47.814

citrinet

64

160

86.125

84.837

95.294

98.694

247.09

63.67

citrinet

96

160

126.16

122.67

135.84

151.61

381.84

95.277

citrinet

128

160

213.25

178.64

293.87

381.58

610.95

126.6

jasper

1

100

19.313

19.058

19.922

21.069

25.638

0.99949

jasper

8

100

22.77

22.278

24.183

26.782

39.872

7.9928

jasper

16

100

38.901

38.689

41.238

44.288

63.133

15.978

jasper

32

100

62.84

64

73.879

77.532

99.325

31.925

jasper

48

100

79.371

79.701

91.871

95.88

232.79

47.845

jasper

64

100

123.11

104.12

159.64

194.57

621.62

63.73

jasper

96

100

6296

6773.9

9091.7

9558.3

10582

84.651

jasper

128

100

14234

12951

27171

28922

30684

85.557

quartznet

1

100

8.2814

7.9853

9.1149

9.938

14.611

0.99964

quartznet

8

100

12.319

11.417

14.926

17.17

33.032

7.9933

quartznet

16

100

17.689

16.901

20.816

22.627

45.56

15.981

quartznet

32

100

22.753

21.951

26.937

28.943

66.365

31.942

quartznet

48

100

28.46

27.856

34.195

37.101

83.296

47.88

quartznet

64

100

147.01

149.18

273.91

301.15

379.3

63.724

quartznet

96

100

102.59

63.048

200.29

260.69

317.83

95.218

quartznet

128

100

183.08

169.84

312.08

380.39

562.45

127.12

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

800

12.499

12.171

12.917

16.227

16.715

0.99964

citrinet

64

800

96.624

95.566

104.06

139.88

190.72

63.727

citrinet

128

800

154.45

160.09

176.91

216.03

310.11

127.15

citrinet

256

800

284.54

312.12

337.24

426.03

623.07

252.74

citrinet

384

800

407.1

427.27

471.66

592

1288.1

376.46

citrinet

512

800

553.92

544.28

632.77

1066

2346.7

498.69

citrinet

768

800

4443.9

3536.1

7644

10741

19530

649.68

citrinet

1024

800

13985

13048

25700

27722

35948

662.18

jasper

1

800

23.608

23.412

29.76

30.542

30.636

0.99949

jasper

64

800

117.65

113.98

158.53

179.15

206.35

63.764

jasper

128

800

191.68

191.68

249.16

301.47

354.78

127.15

jasper

256

800

336.55

341.41

471.84

575.72

678.39

252.51

jasper

384

800

498.33

480.12

658.43

1267.7

1449.2

376.51

jasper

512

800

937.12

652.12

2284.5

3164

4349.1

497.21

jasper

768

800

10850

4084.4

29171

35335

53749

463.86

jasper

1024

800

21063

11314

55549

61478

87355

447.81

quartznet

1

800

13.688

13.055

18.644

19.929

20.172

0.99966

quartznet

64

800

79.151

67.019

146.7

165.51

225.99

63.799

quartznet

128

800

137.63

128.42

235.86

278.86

340.59

127.09

quartznet

256

800

196.42

188.09

331.67

400.93

530.61

253.69

quartznet

384

800

256.63

229.5

414.07

556.41

712.33

378.83

quartznet

512

800

308.14

270.6

509.56

719.1

1115.9

501.67

quartznet

768

800

458.11

376.72

737.8

1322.9

1950.3

747.79

quartznet

1024

800

758.74

462.36

2045.4

2778.1

3697

978.44

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

1600

12.722

12.232

15.655

15.965

15.969

0.99967

citrinet

256

1600

350.88

368.51

479.78

606.16

703.21

252.38

citrinet

512

1600

627.13

675.13

879.89

1126.6

1326

498.74

citrinet

768

1600

936.74

966.71

1311.3

2066.7

2932.3

738.16

citrinet

1024

1600

1500.5

1277.3

2426.4

5046.6

7855.1

971.78

citrinet

1280

1600

5291.8

2806.3

16199

29723

35459

983.95

citrinet

1512

1600

10587

9166.5

22602

32147

41210

1041.7

jasper

1

3200

35.26

37.546

40.014

40.316

40.316

0.99939

jasper

256

3200

740.53

734.81

918.48

944.47

982.03

251.35

jasper

512

3200

1757.2

1681.1

2699.1

2842.6

2960.5

488.75

jasper

768

3200

3138.9

2733

5488.5

5763.1

5991.5

711.18

jasper

1024

3200

17168

15068

34410

41135

44935

688.57

jasper

1280

3200

24240

22543

50907

54452

57914

713.93

jasper

1512

3200

31701

30755

66014

69236

74463

725.97

quartznet

1

3200

29.538

31.703

33.113

33.295

33.295

0.99946

quartznet

256

3200

365.4

368.32

530.98

557.93

593.19

253.27

quartznet

512

3200

669.9

648

1005.1

1056.1

1113.1

501.11

quartznet

768

3200

1199.8

1127.8

1973.3

2081.5

2215.7

737.01

quartznet

1024

3200

1662.2

1658.7

2784.6

2940

3352.4

965

quartznet

1280

3200

2237.2

2140.3

4251.7

4754.7

5143.7

1182.6

quartznet

1512

3200

4715.5

5637.5

7989.1

8529.6

10924

1305.6

NVIDIA T4 GPU

Streaming, low-latency

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

160

22.635

22.249

22.582

22.685

42.217

0.99917

citrinet

8

160

49.495

49.966

50.891

53.131

113.45

7.98

citrinet

16

160

66.058

64.554

72.181

75.346

128.65

15.955

citrinet

32

160

101.06

98.529

102.66

111.5

245.29

31.846

citrinet

48

160

155.53

146.41

157.71

222.9

384.39

47.653

citrinet

64

160

1803.7

1540.8

3654.9

3920

4135.9

61.772

citrinet

96

160

14700

14635

26812

28045

28853

62.502

citrinet

128

160

27775

25526

51278

54219

58549

62.646

jasper

1

100

46.574

46.514

48.436

52.968

61.571

0.99882

jasper

8

100

47.432

47.816

53.724

61.504

86.435

7.9854

jasper

16

100

71.97

69.933

80.435

88.609

166.16

15.96

jasper

32

100

242.83

213.96

313.9

380.22

769.68

31.817

jasper

48

100

11109

10564

22207

23345

26297

39.077

jasper

64

100

17960

17380

32749

34696

36515

39.716

jasper

96

100

32650

31021

59770

63664

69997

40.135

jasper

128

100

47530

46526

89112

95926

1.0421e+05

40.237

quartznet

1

100

16.095

15.359

18.511

20.095

32.045

0.99927

quartznet

8

100

26.138

24.417

31.26

35.824

68.615

7.9863

quartznet

16

100

40.712

38.862

52.142

57.644

96.548

15.963

quartznet

32

100

48.311

47.788

61.818

70.364

113.81

31.91

quartznet

48

100

67.572

68.755

84.728

107.42

175.52

47.817

quartznet

64

100

104.47

100.67

146.32

179.42

282.26

63.672

quartznet

96

100

3101

3131.4

5489.3

5709.5

5847.9

86.397

quartznet

128

100

13448

13493

24066

25320

26140

85.856

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

800

25.995

25.455

26.11

34.216

34.388

0.99932

citrinet

64

800

178.69

175.21

202.32

269.79

320.77

63.617

citrinet

128

800

300.44

335.46

369.27

423.07

568.59

126.51

citrinet

256

800

710.44

667.58

844.33

1521.5

3202.1

249.32

citrinet

384

800

8847

8405.9

15005

16980

25160

289.74

citrinet

512

800

19569

20095

34106

35865

44480

292.51

citrinet

768

800

39067

38517

68252

73955

95873

292.64

citrinet

1024

800

58068

59692

1.0214e+05

1.1194e+05

1.3164e+05

294.07

jasper

1

800

74.805

74.618

83.792

88.073

88.556

0.9985

jasper

64

800

218.84

222.47

276.46

313.51

352.64

63.597

jasper

128

800

359.54

377.23

447.89

537.72

612.07

126.45

jasper

256

800

1030.6

722.97

2353.5

3576.4

4687.6

248.78

jasper

384

800

10746

6039.5

29346

36416

46165

254.18

jasper

512

800

20701

13827

53326

57297

76804

246.6

jasper

768

800

38972

27584

97278

1.03e+05

1.2926e+05

251

jasper

1024

800

59655

44370

1.4406e+05

1.5043e+05

1.8314e+05

251.25

quartznet

1

800

28.383

29.148

36.584

36.763

37.501

0.99928

quartznet

64

800

144.08

139.86

212.4

241.42

305.26

63.626

quartznet

128

800

190.34

175.36

259.42

338.67

411.78

126.94

quartznet

256

800

296.54

280.51

433.4

592.82

733.86

252.22

quartznet

384

800

422.68

383.37

641.46

1206.5

1443.6

375.73

quartznet

512

800

645.59

493.39

1437

2161.5

2949.2

496.04

quartznet

768

800

5576.4

776.85

17509

21659

29913

606.3

quartznet

1024

800

13133

8042.4

36110

41498

53066

617.3

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

citrinet

1

1600

28.264

27.277

35.521

35.658

35.894

0.9993

citrinet

256

1600

709.2

787.66

919.32

1094.4

1232.4

249.91

citrinet

512

1600

3510.8

1507

12597

22842

26429

449.24

citrinet

768

1600

16036

14794

33148

43614

55698

466.63

citrinet

1024

1600

28188

26410

50763

70603

80471

472.93

citrinet

1280

1600

39776

36591

68475

98183

1.0802e+05

475.27

citrinet

1512

1600

51037

48000

87882

1.2297e+05

1.3285e+05

476.79

jasper

1

3200

96.734

99.951

103

103.1

103.1

0.9983

jasper

256

3200

1888.4

1823.2

2825.5

2928.4

3014.4

245.39

jasper

512

3200

15452

13663

31656

36348

40256

367.19

jasper

768

3200

32172

30853

63834

65991

72260

390.75

jasper

1024

3200

49360

56457

92969

99690

1.0594e+05

390.25

jasper

1280

3200

66512

77675

1.2441e+05

1.3225e+05

1.3976e+05

391.93

jasper

1512

3200

87966

91424

1.6906e+05

1.7142e+05

1.7813e+05

380.89

quartznet

1

3200

54.223

56.042

60.447

60.611

60.611

0.99896

quartznet

256

3200

770.9

767.62

1003.3

1045.2

1093.5

251.06

quartznet

512

3200

1685.9

1713.3

2603.6

2711.8

2847.1

486.01

quartznet

768

3200

2953.8

2811.9

5399.7

5776.5

6094.3

704.69

quartznet

1024

3200

14030

12917

28169

34882

38790

728.52

quartznet

1280

3200

21590

19951

47433

49750

55253

727.14

quartznet

1512

3200

27735

26985

60254

62852

66594

740.31

NLP

Performance of the Riva named entity recognition (NER) service (using a BERT-base model, sequence length of 128) and the Riva Question Answering (QA) service (using a BERT-large model, sequence length of 384) was measured in Riva. Batch size 1 latency and maximum throughput were measured.

NVIDIA A100 GPU

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

2.64

2.62

2.74

2.79

3.03

375.374

NER

256

245

250

275

285

285

993.923

Q&A

1

5.96

5.77

6.59

6.97

7.38

166.997

Q&A

128

526

535

547

549

554

237.02

NVIDIA A30 GPU

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

3.6

3.53

3.88

4.09

4.34

274.549

NER

256

280

287

293

294

338

868.025

Q&A

1

7.98

7.76

8

10.6

10.7

124.643

Q&A

128

671

684

688

688

715

185.882

NVIDIA V100 GPU

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

3.48

3.48

3.61

3.63

3.75

284.68

NER

256

393

418

427

429

431

617.25

Q&A

1

9.38

9.37

9.55

9.59

9.65

106.053

Q&A

128

932

955

957

959

964

133.901

NVIDIA T4 GPU

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

5.23

5

6.52

6.62

7.16

189.031

NER

256

541

560

567

568

621

450.158

Q&A

1

14.9

14.2

14.9

15.2

26.2

66.8848

Q&A

128

1.55e+03

1.58e+03

1.59e+03

1.59e+03

1.59e+03

80.5507

TTS

Performance of the Riva text-to-speech (TTS) service was measured for different number of parallel streams. Each parallel stream performed 10 iterations over 10 input strings from the LJSpeech dataset. Latency to first audio chunk and latency between successive audio chunks and throughput were measured.

NVIDIA A100 GPU

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

FastPitch + Hifi-GAN

1

0.028

0.035

0.036

0.038

0.003

0.004

0.004

0.005

133.139

FastPitch + Hifi-GAN

4

0.044

0.056

0.062

0.069

0.006

0.009

0.010

0.012

340.038

FastPitch + Hifi-GAN

6

0.057

0.073

0.079

0.092

0.007

0.011

0.012

0.015

389.500

FastPitch + Hifi-GAN

8

0.066

0.086

0.091

0.107

0.009

0.013

0.015

0.018

443.373

FastPitch + Hifi-GAN

10

0.070

0.090

0.095

0.110

0.009

0.014

0.016

0.019

464.114

Tacotron 2 + WaveGlow

1

0.046

0.051

0.052

0.059

0.022

0.024

0.025

0.028

34.007

Tacotron 2 + WaveGlow

4

0.258

0.364

0.392

0.442

0.027

0.040

0.047

0.060

59.364

Tacotron 2 + WaveGlow

6

0.381

0.503

0.548

0.603

0.033

0.052

0.061

0.079

65.714

Tacotron 2 + WaveGlow

8

0.509

0.675

0.715

0.789

0.036

0.059

0.069

0.088

69.662

Tacotron 2 + WaveGlow

10

0.612

0.787

0.887

1.041

0.039

0.066

0.076

0.096

72.609

NVIDIA A30 GPU

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

FastPitch + Hifi-GAN

1

0.032

0.036

0.037

0.039

0.004

0.004

0.005

0.005

118.981

FastPitch + Hifi-GAN

4

0.055

0.071

0.075

0.084

0.007

0.011

0.013

0.016

265.075

FastPitch + Hifi-GAN

6

0.071

0.091

0.096

0.108

0.009

0.015

0.017

0.020

308.260

FastPitch + Hifi-GAN

8

0.084

0.107

0.114

0.126

0.011

0.018

0.021

0.024

343.723

FastPitch + Hifi-GAN

10

0.094

0.121

0.129

0.156

0.012

0.020

0.022

0.027

349.511

Tacotron 2 + WaveGlow

1

0.065

0.070

0.071

0.073

0.031

0.033

0.034

0.035

24.627

Tacotron 2 + WaveGlow

4

0.328

0.456

0.480

0.554

0.039

0.057

0.064

0.086

44.569

Tacotron 2 + WaveGlow

6

0.506

0.683

0.737

0.814

0.048

0.076

0.089

0.112

47.723

Tacotron 2 + WaveGlow

8

0.688

0.914

0.968

1.059

0.055

0.089

0.103

0.139

49.749

Tacotron 2 + WaveGlow

10

0.840

1.178

1.315

1.480

0.061

0.106

0.124

0.155

49.764

NVIDIA V100 GPU

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

FastPitch + Hifi-GAN

1

0.030

0.033

0.033

0.034

0.005

0.006

0.006

0.007

107.159

FastPitch + Hifi-GAN

4

0.065

0.085

0.092

0.108

0.010

0.017

0.020

0.025

212.439

FastPitch + Hifi-GAN

6

0.095

0.129

0.141

0.159

0.013

0.023

0.027

0.033

225.813

FastPitch + Hifi-GAN

8

0.125

0.168

0.177

0.206

0.016

0.029

0.034

0.042

235.513

FastPitch + Hifi-GAN

10

0.150

0.204

0.228

0.313

0.018

0.033

0.037

0.052

232.068

Tacotron 2 + WaveGlow

1

0.057

0.059

0.060

0.060

0.031

0.033

0.033

0.033

25.236

Tacotron 2 + WaveGlow

4

0.388

0.545

0.592

0.677

0.047

0.071

0.080

0.100

37.289

Tacotron 2 + WaveGlow

6

0.598

0.847

0.893

1.039

0.057

0.090

0.103

0.136

40.055

Tacotron 2 + WaveGlow

8

0.814

1.097

1.162

1.307

0.063

0.100

0.116

0.148

42.555

Tacotron 2 + WaveGlow

10

0.979

1.345

1.472

1.630

0.071

0.114

0.134

0.178

42.907

NVIDIA T4 GPU

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

FastPitch + Hifi-GAN

1

0.052

0.058

0.059

0.075

0.006

0.007

0.008

0.008

73.429

FastPitch + Hifi-GAN

4

0.105

0.131

0.139

0.156

0.016

0.025

0.028

0.034

132.176

FastPitch + Hifi-GAN

6

0.148

0.188

0.199

0.233

0.022

0.036

0.041

0.048

140.982

FastPitch + Hifi-GAN

8

0.188

0.240

0.258

0.313

0.028

0.047

0.052

0.062

148.327

FastPitch + Hifi-GAN

10

0.211

0.275

0.295

0.327

0.030

0.050

0.058

0.071

150.003

Tacotron 2 + WaveGlow

1

0.107

0.115

0.117

0.118

0.051

0.055

0.055

0.056

14.859

Tacotron 2 + WaveGlow

4

0.717

1.018

1.100

1.225

0.107

0.167

0.191

0.246

18.454

Tacotron 2 + WaveGlow

6

1.158

1.607

1.723

1.929

0.137

0.224

0.255

0.311

18.964

Tacotron 2 + WaveGlow

8

1.643

2.252

2.390

2.606

0.163

0.269

0.307

0.391

19.275

Tacotron 2 + WaveGlow

10

2.065

2.848

3.176

3.615

0.174

0.296

0.338

0.430

18.726

Performance Considerations

When the server is under high load, requests might time out, as the server will not start inference for a new request until a previous request is completely generated so that inference slot can be freed. This is done to maximize throughput for the TTS service and allow for real-time interaction. NVIDIA does not recommend making more than 8-10 simultaneous requests with the models provided in Riva 1.0.0 beta.