Performance

Below are measured performance for the Riva ASR, NLP, and TTS services on NVIDIA T4, V100 SXM2 16 GB, and NVIDIA A100 SXM4 40 GB GPUs. CPU specifications for each system can be found here:

ASR

The latency numbers below were measured using the streaming recognition mode, with the BERT-based punctuation model enabled, a 4-gram language model, a decoder beam width of 128, and timestamps enabled. The Jasper, QuartzNet and CitriNet-1024 acoustic models were tested. The client and the server used audio chunks of the same duration (100ms, 160ms, 800ms, 3200ms depending on the server configuration). The Riva streaming client riva_streaming_asr_client, provided in the Riva client image, was used with the --simulate_realtime flag to simulate transcription from a microphone, where each stream was doing 5 iterations over a sample audio file from the Librispeech dataset (1272-135031-0000.wav). The command used was:

riva_streaming_asr_client  \
     --chunk_duration_ms=<chunk_duration> --simulate_realtime=true \
     --automatic_punctuation=true --num_parallel_requests=<num_streams> \
     --word_time_offsets=true --print_transcripts=false \
     --interim_results=false --num_iterations=<5*num_streams> \
     --audio_file=1272-135031-0000.wav --output_filename=/tmp/output.json

The riva_streaming_asr_client returns latency measured in three different ways after executing the benchmark task:

  • intermediate latency: latency to return an intermediate transcript with is_final == false

  • final latency: latency of messages return with is_final == true

  • latency: the overall latency of all returned message types

The overall latency numbers are reported below.

NVIDIA A100 GPU

Streaming, low-latency

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

100

13.775

13.088

17.046

20.146

20.146

0.99966

quartznet

8

100

22.99

22.786

28.174

30.111

31.194

7.9956

quartznet

16

100

35.618

35.717

43.401

43.649

45.561

15.985

quartznet

32

100

50.736

52.46

57.71

59.6

60.75

31.961

quartznet

48

100

64.695

65.642

76.823

78.963

84.639

47.92

quartznet

64

100

75.197

77.22

91.216

92.174

96.638

63.876

quartznet

96

100

107.64

106.95

149.25

156.29

163.78

95.767

quartznet

128

100

124.94

131.37

165.57

168.86

172.62

127.63

jasper

1

100

19.223

18.529

26.688

26.688

26.688

0.99963

jasper

8

100

30.539

30.664

34.128

40.741

44.225

7.995

jasper

16

100

48.232

48.636

53.826

56.712

57.008

15.984

jasper

32

100

65.746

66.766

73.168

73.828

75.138

31.958

jasper

48

100

80.328

82.805

93.649

94.126

97.669

47.92

jasper

64

100

106.6

93.019

202.1

277.21

285.15

63.881

jasper

96

100

148.85

167.47

182.81

184.51

187.58

95.739

jasper

128

100

200.59

197.58

256.19

278.65

291.72

127.52

citrinet

1

160

17.677

16.331

23.152

23.152

23.152

0.99967

citrinet

8

160

34.511

34.28

38.447

41.862

41.922

7.9944

citrinet

16

160

60.983

62.667

69.104

71.966

72.043

15.98

citrinet

32

160

106.95

107.01

111.47

111.86

115.34

31.932

citrinet

48

160

137.46

139.69

145.42

146.38

148.59

47.867

citrinet

64

160

159.28

157.12

180.18

191.43

192.25

63.79

citrinet

96

160

235.08

242.71

261.76

266.1

271.56

95.599

citrinet

128

160

361.5

340.03

480.78

490.2

511

127.21

Streaming, high-throughput

Acoustic Model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

800

15.743

14.814

25.57

25.57

25.57

0.99968

quartznet

64

800

81.046

86.341

97.161

98.439

101.32

63.886

quartznet

128

800

126.47

130.05

155.72

160.57

166.99

127.65

quartznet

256

800

197.65

206.16

265.12

282.63

314.48

254.71

quartznet

384

800

269.71

282.78

360.83

374.16

390.11

381.3

quartznet

512

800

282.5

302.8

386.91

418.78

448.37

508.07

quartznet

768

800

402.32

413.01

576.61

600.85

649.58

760.31

quartznet

1024

800

514.61

510.91

718.04

773.47

872.36

1009.9

jasper

1

800

21.802

21.758

31.508

31.508

31.508

0.99963

jasper

64

800

108.88

109.01

137.25

138.69

142.34

63.864

jasper

128

800

159.01

158.04

198.9

219.88

233.1

127.56

jasper

256

800

224.17

231.63

287.34

296.27

304.76

254.64

jasper

384

800

317.98

322.49

418.42

437.52

455.75

381.08

jasper

512

800

397.81

402.84

547.15

581.55

637.14

507.09

jasper

768

800

567.77

534.4

757.2

900.83

1100.6

757.87

jasper

1024

800

1000.6

986.46

1388.4

1423.5

1512.9

1004.9

citrinet

1

800

16.451

13.965

26.439

26.439

26.439

0.99971

citrinet

64

800

111.57

112.32

132.88

139.58

142.24

63.819

citrinet

128

800

170.44

180.26

202

209.28

212.95

127.49

citrinet

256

800

250.8

251.68

311.61

324.22

332.85

254.29

citrinet

384

800

347.34

371.71

429.55

437.91

448.47

380.63

citrinet

512

800

426.17

436.69

531.7

554.93

578.55

506.7

citrinet

768

800

912.08

802.33

1337.2

1355.9

1392.5

755.45

citrinet

1024

800

1561.7

1519

2266.9

2334.7

2655.5

998.55

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

1600

26.22

26.387

30.733

37.175

37.175

0.99957

quartznet

256

1600

310.96

313.52

446.33

466.73

534.91

254.35

quartznet

512

1600

461.33

479.58

632.47

684.36

766.31

506.68

quartznet

768

1600

595.36

617.58

838.34

869.68

912.17

757.45

quartznet

1024

1600

752.81

789.86

1053.1

1119.5

1212.9

1005.2

quartznet

1280

1600

951.09

931.12

1317.6

1385

1954.4

1251.6

quartznet

1512

1600

1189.7

1156.2

1801.4

1984.7

2323.6

1470.5

jasper

1

1600

28.477

29.909

31.968

41.738

41.738

0.99956

jasper

256

1600

390.12

385.22

507.51

556.41

592.78

253.94

jasper

512

1600

617.12

632.95

804.37

822.53

887

505.1

jasper

768

1600

840.32

841.29

1079.1

1112

1143.8

754.65

jasper

1024

1600

1107.4

1073.3

1425

1723.7

2098.4

998.81

jasper

1280

1600

1396

1344.9

2223.4

2288.1

2453.6

1241.4

jasper

1512

1600

1814.3

1791.7

2695.5

2767.7

2861.2

1452.8

citrinet

1

1600

16.525

14.823

23.402

23.402

23.402

0.9997

citrinet

256

1600

284.78

311.47

345.06

358.33

376.56

254.26

citrinet

512

1600

477.13

484.25

597.09

609.65

630.33

505.52

citrinet

768

1600

649.29

667.79

825.69

843.51

869.35

755.88

citrinet

1024

1600

827.25

871.89

1006.1

1062.2

1108.9

1002.9

citrinet

1280

1600

993.9

1067.4

1231.1

1263.1

1288

1247.8

citrinet

1512

1600

1217.5

1214.3

1553.1

1924.1

2475.2

1468.2

NVIDIA V100 GPU

Streaming, low-latency

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

100

16.219

15.743

18.59

18.991

18.991

0.99962

quartznet

8

100

28.638

27.698

40.32

41.184

43.96

7.9937

quartznet

16

100

37.713

38.838

48.903

49.144

52.002

15.985

quartznet

32

100

58.434

58.619

72.506

73.37

76.746

31.95

quartznet

48

100

72.319

74.779

91.255

92.098

95.413

47.911

quartznet

64

100

89.535

91.863

110.78

121.52

137.01

63.861

quartznet

96

100

148

163.32

192.74

197.29

206.67

95.697

quartznet

128

100

197.11

202.9

276.54

294.76

324.81

127.53

jasper

1

100

29.375

28.314

37.363

37.363

37.363

0.99945

jasper

8

100

41.848

42.036

55.031

61.506

61.754

7.993

jasper

16

100

61.287

62.791

68.992

70.869

73.396

15.982

jasper

32

100

106.26

98.485

156.7

157.2

159.56

31.931

jasper

48

100

157.06

179.89

192.93

198.2

264.84

47.875

jasper

64

100

319.66

304.95

396.29

400.8

478.1

63.649

jasper

96

100

19208

20795

28822

28881

29014

79.351

jasper

128

100

46510

49183

67122

67396

67557

66.463

citrinet

1

160

19.539

18.898

22.244

22.244

22.244

0.99961

citrinet

8

160

44.174

44.99

48.045

51.79

51.937

7.9926

citrinet

16

160

82.064

75.79

143.32

148.02

148.08

15.976

citrinet

32

160

130.32

128.85

140.86

142.08

145.73

31.919

citrinet

48

160

175.12

175.14

188.77

194.13

234.44

47.826

citrinet

64

160

250.29

264.11

280.87

285.19

285.73

63.716

citrinet

96

160

408.13

417.58

582.75

614.63

635.04

95.402

citrinet

128

160

723.41

783.02

1008.4

1015.8

1044.6

126.97

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

800

18.69

19.171

21.409

21.409

21.409

0.99962

quartznet

64

800

107.39

112.99

133.36

135.93

137.04

63.828

quartznet

128

800

155.8

162.67

202.49

212.42

225.76

127.55

quartznet

256

800

261.27

271.26

339.65

359.18

373.54

254.35

quartznet

384

800

345.09

355.63

448.95

465.11

506.95

380.59

quartznet

512

800

427.9

431.01

564.88

592.34

641.33

506.45

quartznet

768

800

682.67

679.89

932.53

978.61

1161.5

752.91

quartznet

1024

800

1333.1

1380.9

1841.3

1933.2

2108.6

998.68

jasper

1

800

31.131

33.332

37.633

37.633

37.633

0.99942

jasper

64

800

160.4

158.69

193.9

205.84

215.41

63.803

jasper

128

800

251.48

254.63

310.37

316.14

323.98

127.29

jasper

256

800

429.14

426.38

519.52

533.69

559.53

253.75

jasper

384

800

619.68

625.24

726.98

753.06

991.31

378.75

jasper

512

800

1202.3

1405.2

1534.3

1562.1

1904.4

501.98

jasper

768

800

13400

13849

20750

21581

21774

608.81

jasper

1024

800

48565

54184

70625

72035

72329

465.68

citrinet

1

800

17.44

17.189

19.707

19.707

19.707

0.99965

citrinet

64

800

125.23

122.22

150.43

153.12

156.7

63.813

citrinet

128

800

235.35

244.35

296.1

339.38

349.17

127.33

citrinet

256

800

391.62

399.09

472.78

485.66

504.49

253.44

citrinet

384

800

557.85

556.18

657.04

678.23

712.09

378.86

citrinet

512

800

1224

1381.5

1533.5

1569.2

1913.1

501.52

citrinet

768

800

5063.1

5236.3

6606.9

7179.4

7709.1

688.89

citrinet

1024

800

19368

19916

30321

30436

30529

654.76

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

1600

32.051

33.137

34.889

38

38

0.99944

quartznet

256

1600

418.92

436.08

559.46

577.89

601.12

253.64

quartznet

512

1600

719.14

736.58

999.11

1039.9

1130.8

503.17

quartznet

768

1600

988.4

950.99

1420.8

1513

1617

749.3

quartznet

1024

1600

1413.4

1466

1965.9

2235.4

2493.9

989.17

quartznet

1280

1600

1871.1

2000.3

2644.3

2982.2

3118.8

1220.6

quartznet

1512

1600

2206.1

2335.8

3166

3447.9

3695.4

1422.5

jasper

1

1600

39.439

41.926

43.788

43.935

43.935

0.99935

jasper

256

1600

734.73

738.24

884.44

927.17

987.07

252.21

jasper

512

1600

1475.3

1378.1

2372.8

2431.2

2501.6

498

jasper

768

1600

2427.6

2452.2

3414.7

3473.7

3575.9

729.88

jasper

1024

1600

3954.7

4184.9

5899.5

6005.3

6765.2

953.3

jasper

1280

1600

17678

17212

30763

35474

36963

996.97

jasper

1512

1600

26000

26015

43323

45986

48253

988.53

citrinet

1

1600

18.472

17.615

20.458

20.458

20.458

0.99963

citrinet

256

1600

451.06

478.22

527.48

532.38

539.79

253.26

citrinet

512

1600

807.31

866.29

984.91

1006.8

1030.9

501.78

citrinet

768

1600

1139.2

1217.3

1390.5

1430.3

1477.9

746.5

citrinet

1024

1600

2112.2

1914.5

3069

3186.8

3491.2

983.31

citrinet

1280

1600

6875.1

7291.8

9279.1

9948.4

10842

1115.3

citrinet

1512

1600

13901

13860

19986

20129

20849

1099.6

NVIDIA A30

Streaming, low-latency

Acoustic model

# of streams

chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

100

23.012

21.484

26.527

35.573

35.573

0.99947

quartznet

8

100

40.112

39.849

52.976

54.077

55.768

7.9913

quartznet

16

100

53.306

55.101

63.018

63.96

66.982

15.979

quartznet

32

100

74.77

77.611

86.416

90.268

91.86

31.944

quartznet

48

100

92.743

95.436

111.61

116.51

124.5

47.895

quartznet

64

100

113.18

115.12

145.84

154.07

174.84

63.838

quartznet

96

100

141.52

148.03

176.71

180.35

195.07

95.706

quartznet

128

100

164.22

178.01

209.67

221.58

241.87

127.5

jasper

1

100

26.798

25.557

36.703

36.703

36.703

0.99947

jasper

8

100

48.764

50.231

56.828

64.799

65.521

7.9916

jasper

16

100

75.89

77.099

85.157

85.514

86.154

15.975

jasper

32

100

89.519

87.85

99.081

99.469

118.22

31.942

jasper

48

100

116.23

105.88

161.76

164.72

176.58

47.894

jasper

64

100

160.6

180.7

196.33

201.24

217.37

63.829

jasper

96

100

3475.8

3817.2

5894.9

5929.5

5994.8

93.12

jasper

128

100

38441

44791

56350

56433

56467

73.177

citrinet

1

160

27.388

25.202

36.378

36.378

36.378

0.9995

citrinet

8

160

60.85

63.105

68.724

70.739

70.963

7.9893

citrinet

16

160

99.186

102.57

106.95

108.31

111.2

15.967

citrinet

32

160

150

152.07

159.92

160.37

164.37

31.898

citrinet

48

160

177.62

176.17

216.78

218.33

230.31

47.827

citrinet

64

160

231.88

246.42

260.82

264.49

270.19

63.722

citrinet

96

160

385.64

412.34

477.62

486.61

500.4

95.411

citrinet

128

160

615.73

673.92

838.69

862.88

900.76

127

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

800

26.547

25.805

37.417

37.417

37.417

0.9995

quartznet

64

800

112.45

115.53

132.46

136.06

140.55

63.832

quartznet

128

800

175.8

182.18

210.73

223.33

245.22

127.45

quartznet

256

800

230.93

238.92

295.34

306.26

326.54

254.55

quartznet

384

800

292.9

301.39

379.25

390.45

408.39

381.06

quartznet

512

800

347.17

355.37

449.16

460.51

478.91

507.15

quartznet

768

800

490.32

498.81

634.44

652.14

812.12

757.79

quartznet

1024

800

831.5

781.33

1236.7

1274.9

1332

1005.3

jasper

1

800

32.656

33.29

53.944

53.944

53.944

0.99947

jasper

64

800

147.18

147.45

172.74

187.07

195.84

63.789

jasper

128

800

211.23

209.42

258.8

288.3

305.71

127.43

jasper

256

800

330

331.1

404.37

417.7

437.26

253.96

jasper

384

800

448.81

450.9

542.06

554.14

580.03

379.93

jasper

512

800

585.83

589.64

706.41

728.01

860.37

504.74

jasper

768

800

1904.1

1875.4

2894.1

3055.6

3211.4

750.04

jasper

1024

800

25507

35165

42603

44092

44443

601.03

citrinet

1

800

23.94

21.378

34.312

34.312

34.312

0.99955

citrinet

64

800

142.53

147.9

169.59

174.97

178.62

63.804

citrinet

128

800

199.64

212.55

240.21

243.44

247.29

127.37

citrinet

256

800

343.58

360.52

405.54

410.63

414.54

253.81

citrinet

384

800

481.4

495.56

559.23

571.46

587.83

379.51

citrinet

512

800

847.78

763.69

1280.3

1358.6

1378.5

503.55

citrinet

768

800

2214.9

2090.3

3187.2

3395.2

3617.1

742.31

citrinet

1024

800

14236

14517

22074

22161

22216

725.37

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

1600

44.49

45.29

46.942

60.447

60.447

0.99925

quartznet

256

1600

436.26

434.66

605.55

640.83

700.65

253.88

quartznet

512

1600

660.24

635.61

966.28

1003.4

1054.1

505.12

quartznet

768

1600

844.37

818.38

1239.9

1305.6

1418.8

753.56

quartznet

1024

1600

1128.2

1087.3

1642.5

1784

2041.5

998.23

quartznet

1280

1600

1428.9

1404.9

2032.5

2237.9

2363.1

1238.6

quartznet

1512

1600

1785.9

1888.8

2555.6

2677.8

2810.1

1448.5

jasper

1

1600

42.569

44.758

47.754

57.853

57.853

0.99932

jasper

256

1600

588.18

581.99

708.42

728.89

759.12

253.13

jasper

512

1600

1023.7

1033.8

1204.5

1236.5

1383.1

501.23

jasper

768

1600

1759.2

1663.7

2649.2

2708.3

2752.6

741.65

jasper

1024

1600

2446.9

2551.6

3506.5

3592.4

3712.4

971.2

jasper

1280

1600

3700.4

3693.1

5732.5

5804.8

6500.5

1192.7

jasper

1512

1600

12969

10839

24484

27178

30380

1288.9

citrinet

1

1600

24.544

22.226

33.908

33.908

33.908

0.99954

citrinet

256

1600

377.66

387.39

455.11

464.64

477.74

253.63

citrinet

512

1600

714.22

732.03

823.01

837.19

850.16

503.68

citrinet

768

1600

978.02

1051.6

1162.8

1178.3

1196.3

749.71

citrinet

1024

1600

1408.8

1395.2

1800.7

2050.7

2365.8

992.91

citrinet

1280

1600

3271.2

3044.7

5138.4

5403.9

5662.1

1223.4

citrinet

1512

1600

9137.7

9228.6

13005

13105

13319

1212.5

NVIDIA T4

Streaming, low-latency

Acoustic model

# of streams

chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

100

34.801

33.282

41.633

43.414

43.414

0.99922

quartznet

8

100

59.16

58.793

72.267

79.687

81.207

7.9875

quartznet

16

100

99.245

94.272

138.86

143.52

177.08

15.957

quartznet

32

100

114.1

113.33

162.18

169.75

178.31

31.914

quartznet

48

100

156.18

157.62

208.25

220.52

238.8

47.844

quartznet

64

100

212.69

222.17

275.92

291.22

306.31

63.732

quartznet

96

100

3850.5

4131.3

5927.7

6124.6

6234.4

87.748

quartznet

128

100

14242

14185

24099

25082

25135

87.514

jasper

1

100

63.836

64.246

72.547

72.547

72.547

0.99877

jasper

8

100

88.517

90.074

101.94

135.44

135.95

7.9849

jasper

16

100

141.71

146.86

181.78

190.8

191.21

15.958

jasper

32

100

1550.5

1613.8

2251.7

2266.3

2320.6

31.771

jasper

48

100

24754

27473

37153

37214

37601

36.12

jasper

64

100

38824

46842

56015

56042

56223

37.552

jasper

96

100

54981

56534

67697

69204

72110

38.728

jasper

128

100

78679

85508

99784

101070

103300

39.162

citrinet

1

160

46.805

45.001

54.768

54.768

54.768

0.99913

citrinet

8

160

113.42

111.86

122.79

122.88

122.92

7.9819

citrinet

16

160

144.7

142.91

164.55

168.15

194.16

15.96

citrinet

32

160

270

286.08

299.6

299.85

305.03

31.852

citrinet

48

160

448.73

470.56

547.27

549.06

636.78

47.695

citrinet

64

160

4407.8

3847.9

8337.2

8342

8371.6

62.036

citrinet

96

160

18316

16287

25966

25984

25996

64.945

citrinet

128

160

35592

38581

53236

53258

53277

64.504

Streaming, high-throughput

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

800

39.787

38.593

48.83

48.83

48.83

0.99926

quartznet

64

800

218.91

220.37

261.84

282.89

298.67

63.704

quartznet

128

800

277.12

280.97

329.95

343.71

363.26

127.14

quartznet

256

800

408.6

413.78

499.96

512.58

541.7

253.38

quartznet

384

800

587.63

600.25

723.74

749.9

788.41

378.35

quartznet

512

800

951.12

830.73

1447.2

1475.7

1529.2

501.08

quartznet

768

800

5059.1

5104.1

6927.2

7345.6

7845.1

695.97

quartznet

1024

800

30971

43331

49598

50224

50660

563.6

jasper

1

800

86.674

89.077

97.608

97.608

97.608

0.99845

jasper

64

800

266.74

266.8

294.38

301.01

307.94

63.634

jasper

128

800

455.42

461.97

510.03

516.28

529.67

126.78

jasper

256

800

1552.4

1558.4

2347

2373.2

2458.4

250.4

jasper

384

800

13581

14258

18098

18924

21047

297.66

jasper

512

800

27412

28616

38468

39011

41410

298.23

jasper

768

800

52793

57476

73666

75289

77714

297.88

jasper

1024

800

83042

91887

116950

119320

120910

297.89

citrinet

1

800

37.433

35.331

46.439

46.439

46.439

0.9993

citrinet

64

800

241.68

247.81

271.41

274.04

275.09

63.657

citrinet

128

800

424.03

433.7

493.04

520.81

534.62

126.79

citrinet

256

800

1369.2

1497.7

1863.8

2114.7

2281.9

251.09

citrinet

384

800

10020

10609

13375

13609

14362

307.57

citrinet

512

800

23613

23310

36234

36344

36439

306.23

citrinet

768

800

52285

58787

75664

76529

77375

306.91

citrinet

1024

800

81081

92544

109610

115160

116940

307.48

Offline

Acoustic model

# of streams

Chunk size (ms)

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

quartznet

1

1600

60.939

61.636

64.237

74.117

74.117

0.99897

quartznet

256

1600

809.5

816.96

994.01

1020.4

1054.3

251.57

quartznet

512

1600

1431.8

1385.6

1748.1

2171.5

2459.6

494.84

quartznet

768

1600

2512.7

2608.9

3429.6

3595.7

3750.9

724.14

quartznet

1024

1600

4221.7

4241.3

6399.2

6762.4

7838.2

938.91

quartznet

1280

1600

19563

19149

34639

37018

40548

932.52

quartznet

1512

1600

27150

28335

46041

48533

53078

948.37

jasper

1

1600

100.59

103.02

107.48

115.31

115.31

0.99826

jasper

256

1600

1867.4

1687.9

2727.4

2755.7

2824

247.67

jasper

512

1600

12465

10921

21519

25644

27407

448.37

jasper

768

1600

32174

34305

52300

54010

58519

459.94

jasper

1024

1600

51569

57061

81640

83689

88960

458.96

jasper

1280

1600

71547

81632

111870

113440

120440

460.32

jasper

1512

1600

88812

105260

137660

139270

146870

462.58

citrinet

1

1600

38.012

35.651

47.481

47.481

47.481

0.9993

citrinet

256

1600

888.45

896.11

1034.9

1083.5

1121.9

250.51

citrinet

512

1600

4005.4

3846.1

6488.6

6636.4

7069.2

490.21

citrinet

768

1600

19784

19917

28363

28513

29171

502.52

citrinet

1024

1600

35765

35869

54278

55179

55298

504.51

citrinet

1280

1600

51602

57014

74890

75890

77607

506.95

citrinet

1512

1600

66732

74685

101050

102330

103250

506.53

NLP

Performance of the Riva named entity recognition (NER) service (using a BERT-base model, sequence length of 128) and the Riva Question Answering (QA) service (using a BERT-large model, sequence length of 384) was measured in Riva. Batch size 1 latency and maximum throughput were measured.

NVIDIA A100 GPU

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

2.76

2.66

3.07

3.12

4.14

359.722

NER

256

79.2

79.2

103

115

120

3170.57

Q&A

1

7.56

7.39

7.78

8.99

9.21

131.922

Q&A

128

243

238

267

318

447

523.794

NVIDIA V100 GPU

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

4.51

4.48

4.65

4.72

4.83

219.756

NER

256

133

138

144

146

150

1834.98

Q&A

1

10.1

10.1

10.2

10.2

10.3

98.253

Q&A

128

588

589

642

709

1010

217.025

NVIDIA T4

Task

# of streams

Latency (ms)

Throughput (seq/s)

avg

p50

p90

p95

p99

NER

1

6.32

5.69

8.74

8.84

9.17

156.82

NER

256

268

268

280

316

367

912.299

Q&A

1

16.6

16

16.4

19.3

31.3

60.2562

Q&A

128

1300

1300

1640

1730

2380

98.0043

TTS

Performance of the Riva text-to-speech (TTS) service was measured for different number of parallel streams. Each parallel stream performed 10 iterations over 10 input strings from the LJSpeech dataset. Latency to first audio chunk and latency between successive audio chunks and throughput were measured.

NVIDIA A100 GPU

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

Fastpitch-Hifigan

1

0.023

0.026

0.026

0.027

0.005

0.006

0.006

0.007

107.21

Fastpitch-Hifigan

4

0.047

0.060

0.064

0.093

0.011

0.013

0.013

0.014

216.39

Fastpitch-Hifigan

6

0.064

0.083

0.085

0.098

0.012

0.014

0.014

0.015

266.74

Fastpitch-Hifigan

8

0.086

0.114

0.142

0.191

0.016

0.017

0.054

0.057

261.71

Fastpitch-Hifigan

10

0.120

0.187

0.228

0.284

0.023

0.056

0.058

0.061

213.50

Tacotron2-Waveglow

1

0.042

0.044

0.045

0.047

0.022

0.023

0.023

0.025

33.227

Tacotron2-Waveglow

4

0.247

0.359

0.373

0.430

0.027

0.043

0.049

0.057

57.417

Tacotron2-Waveglow

6

0.345

0.460

0.479

0.533

0.034

0.058

0.065

0.076

64.378

Tacotron2-Waveglow

8

0.443

0.577

0.615

0.667

0.040

0.066

0.073

0.088

70.005

Tacotron2-Waveglow

10

0.534

0.667

0.716

0.900

0.041

0.070

0.080

0.093

73.183

NVIDIA V100 GPU

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

Fastpitch-Hifigan

1

0.033

0.036

0.036

0.037

0.011

0.011

0.011

0.011

62.71

Fastpitch-Hifigan

4

0.116

0.152

0.160

0.173

0.020

0.024

0.025

0.025

102.31

Fastpitch-Hifigan

6

0.177

0.223

0.234

0.245

0.025

0.029

0.032

0.034

111.53

Fastpitch-Hifigan

8

0.240

0.302

0.318

0.347

0.029

0.035

0.036

0.039

117.56

Fastpitch-Hifigan

10

0.289

0.369

0.388

0.423

0.031

0.037

0.039

0.052

119.08

Tacotron2-Waveglow

1

0.058

0.060

0.060

0.064

0.031

0.033

0.033

0.034

23.84

Tacotron2-Waveglow

4

0.355

0.515

0.536

0.589

0.048

0.075

0.090

0.103

36.40

Tacotron2-Waveglow

6

0.510

0.689

0.740

0.823

0.062

0.105

0.117

0.132

40.24

Tacotron2-Waveglow

8

0.680

0.886

0.937

1.013

0.071

0.119

0.134

0.159

43.03

Tacotron2-Waveglow

10

0.819

1.037

1.098

1.323

0.075

0.126

0.139

0.165

44.21

NVIDIA T4

Model

# of streams

Latency to first audio (s)

Latency between audio chunks (s)

Throughput (RTFX)

avg

p90

p95

p99

avg

p90

p95

p99

Fastpitch-Hifigan

1

0.070

0.070

0.070

0.110

0.020

0.020

0.020

0.020

32.53

Fastpitch-Hifigan

4

0.450

0.590

0.610

0.660

0.030

0.040

0.040

0.050

38.49

Fastpitch-Hifigan

6

0.670

0.920

0.980

1.030

0.040

0.060

0.060

0.060

39.94

Fastpitch-Hifigan

8

0.900

1.230

1.300

1.440

0.050

0.070

0.080

0.080

40.73

Fastpitch-Hifigan

10

1.070

1.460

1.540

1.610

0.050

0.080

0.080

0.090

40.99

Tacotron2-Waveglow

1

0.110

0.110

0.110

0.130

0.050

0.050

0.050

0.050

14.57

Tacotron2-Waveglow

4

0.640

0.910

0.990

1.100

0.100

0.170

0.190

0.220

18.57

Tacotron2-Waveglow

6

0.970

1.330

1.410

1.540

0.140

0.230

0.270

0.320

19.26

Tacotron2-Waveglow

8

1.370

1.810

1.890

2.050

0.170

0.290

0.330

0.410

19.69

Tacotron2-Waveglow

10

1.700

2.200

2.320

2.570

0.180

0.300

0.340

0.420

19.97

When the server is under high load, requests might time out, as the server will not start inference for a new request until a previous request is completely generated so that inference slot can be freed. This is done to maximize throughput for the TTS service and allow for real-time interaction. NVIDIA does not recommend making more than 8-10 simultaneous requests with the models provided in Riva 1.0.0 beta.