Performance#
Evaluation Process#
This section shows the latency and throughput numbers for streaming and offline configurations of the Riva ASR service on different GPUs.
In streaming mode, the client and the server used audio chunks of the same duration. See the Results section for the chunk size value to use.
The Riva streaming client
riva_streaming_asr_client,
provided in the Riva image, was used with the
--simulate_realtime flag to
simulate transcription from a microphone, where each stream was doing three iterations
over a sample audio file (
1272-135031-0000.wav) from the LibriSpeech dev-clean dataset.
You can get the source code for the
riva_streaming_asr_client at Riva C++ Clients.
The following command was used to measure performance:
riva_streaming_asr_client \
--chunk_duration_ms=<chunk_duration> \
--simulate_realtime=true \
--automatic_punctuation=true \
--num_parallel_requests=<num_streams> \
--word_time_offsets=false \
--print_transcripts=false \
--interim_results=false \
--num_iterations=<3*num_streams> \
--audio_file=1272-135031-0000.wav \
--output_filename=/tmp/output.json
The
riva_streaming_asr_client command returns the following latency measurements:
intermediate latency: latency of responses returned with
is_final == false
final latency: latency of responses returned with
is_final == true
latency: the overall latency of all returned responses. This is what is tabulated in the following tables.
The following diagrams are a schematic representation of the different latencies measured by the Riva streaming ASR client.
The following command was used to measure maximum throughput in offline mode:
riva_asr_client \
--automatic_punctuation=true \
--num_parallel_requests=32 \
--word_time_offsets=false \
--print_transcripts=false \
--num_iterations=96 \
--audio_file=1272-135031-0000x5.wav \
--output_filename=/tmp/output.json
where
1272-135031-0000x5.wav is the
1272-135031-0000.wav audio file concatenated five times.
You can get the source code for the
riva_asr_client at Riva C++ Clients.
Results#
Latencies and throughput measurements for streaming and offline configurations are reported in the following tables. Throughput (duration of audio transcribed / computation time) is measured in RTFX.
Note
The values in the tables are average values over three trials. The values in the table are rounded to the last significant digit according to the standard deviation calculated on three trials. If a standard deviation is less than 0.001 of the average, then the corresponding value is rounded as if standard deviation equals 0.001 of the value.
For specifications of the hardware on which these measurements were collected, see the Hardware Specifications section.
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
12.439
|
10.388
|
11.524
|
12.54
|
30.242
|
0.99949
|
8
|
13.006
|
12.508
|
14.673
|
17.203
|
29.539
|
7.9929
|
16
|
18.138
|
17.06
|
24.885
|
27.922
|
49.825
|
15.975
|
32
|
23.093
|
20.141
|
29.905
|
30.991
|
76.264
|
31.915
|
48
|
28.666
|
29.63
|
33.027
|
34.111
|
101.76
|
47.834
|
64
|
32.012
|
32.449
|
35.855
|
37.779
|
136.06
|
63.719
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
14.088
|
12.616
|
17.525
|
20.743
|
51.894
|
0.9994
|
64
|
42.41
|
37.123
|
43.153
|
157.71
|
163.47
|
63.68
|
128
|
61.41
|
49.455
|
61.86
|
197.73
|
307.18
|
126.82
|
256
|
93.439
|
67.938
|
98.617
|
315.2
|
558.71
|
251.39
|
384
|
123.79
|
93.576
|
124.35
|
472.36
|
848.59
|
373.93
|
512
|
166.85
|
117.95
|
318.19
|
615.9
|
1141.7
|
494.12
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
353.62
|
False
|
32
|
3707.4
|
True
|
32
|
170
320
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
16.946
|
14.233
|
17.459
|
18.954
|
21.387
|
0.99936
|
8
|
19.451
|
18.632
|
22.736
|
26.336
|
32.004
|
7.9924
|
16
|
24.811
|
23.88
|
28.443
|
32.162
|
42.753
|
15.978
|
32
|
33.166
|
30.537
|
43.692
|
47.19
|
68.485
|
31.929
|
48
|
44.522
|
49.317
|
57.667
|
60.352
|
93.513
|
47.855
|
64
|
55.794
|
60.906
|
71.632
|
74.103
|
117.78
|
63.755
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
19.719
|
19.153
|
22.404
|
25.33
|
53.192
|
0.99929
|
64
|
67.47
|
73.823
|
84.609
|
89.695
|
91.372
|
63.803
|
128
|
116.36
|
122.78
|
146.72
|
152.17
|
173.31
|
127.31
|
256
|
174.21
|
179.51
|
223.23
|
242.4
|
270.84
|
253.54
|
384
|
225.73
|
208.54
|
317.68
|
323.05
|
345.72
|
379.42
|
512
|
281.13
|
299.92
|
406.08
|
416.05
|
517.49
|
503.1
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
335.83
|
False
|
32
|
3890.1
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
27.7
|
26.0
|
27.7
|
28.5
|
42.8
|
1.0
|
8
|
33.1
|
32.9
|
35.1
|
35.9
|
54.1
|
8.0
|
16
|
43.4
|
42.7
|
45.7
|
57.2
|
73.9
|
16.0
|
32
|
59.7
|
48.4
|
78.1
|
80.1
|
105.4
|
31.9
|
48
|
99.1
|
106.5
|
110.5
|
112.2
|
187.5
|
47.7
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
23.7
|
22.9
|
26.3
|
27.6
|
50.4
|
1.0
|
64
|
135.7
|
160.3
|
167.9
|
171.1
|
174.8
|
63.7
|
128
|
273.6
|
300.2
|
314.6
|
319.5
|
344.2
|
126.8
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
224.8
|
False
|
32
|
1223.7
160
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
34.3
|
33.8
|
35.5
|
36.1
|
61.2
|
1.0
|
True
|
1
|
38.1
|
34.8
|
48.5
|
50.0
|
93.0
|
1.0
|
False
|
8
|
41.2
|
40.7
|
43.1
|
43.9
|
76.4
|
8.0
|
True
|
8
|
53.6
|
41.4
|
89.8
|
96.9
|
165.4
|
8.0
|
False
|
16
|
52.4
|
51.1
|
53.9
|
70.7
|
104.1
|
15.9
|
True
|
16
|
70.9
|
51.6
|
114.8
|
129.8
|
257.8
|
15.9
|
False
|
32
|
78.4
|
64.4
|
102.5
|
105.8
|
145.8
|
31.8
|
True
|
32
|
115.5
|
102.1
|
201.6
|
217.2
|
394.5
|
31.6
|
False
|
48
|
105.5
|
124.7
|
132.7
|
136.8
|
174.0
|
47.6
|
True
|
48
|
169.7
|
141.4
|
258.0
|
287.1
|
518.0
|
47.3
960
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
62.0
|
61.3
|
64.7
|
65.6
|
92.2
|
1.0
|
True
|
1
|
93.9
|
76.6
|
100.4
|
101.0
|
538.6
|
1.0
|
False
|
64
|
230.0
|
269.0
|
275.3
|
278.4
|
280.1
|
63.4
|
True
|
64
|
388.3
|
425.9
|
495.3
|
510.8
|
525.9
|
63.2
|
False
|
128
|
366.7
|
398.8
|
416.5
|
429.6
|
446.6
|
126.2
|
True
|
128
|
600.0
|
621.0
|
644.7
|
900.0
|
957.3
|
124.4
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
21.0
|
False
|
32
|
357.2
|
True
|
32
|
270.1
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
23.02
|
23.176
|
28.694
|
29.829
|
44.012
|
0.99863
|
8
|
34.593
|
33.966
|
40.212
|
45.607
|
94.882
|
7.9646
|
16
|
42.333
|
41.683
|
50.64
|
58.927
|
93.028
|
15.953
|
32
|
55.452
|
51.111
|
76.248
|
82.525
|
129.15
|
31.828
|
48
|
72.80
|
175.236
|
94.222
|
106.1
|
223.45
|
47.592
|
64
|
97.943
|
100.13
|
116.06
|
126.09
|
240.68
|
63.512
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
29.309
|
29.149
|
36.643
|
41.075
|
47.067
|
0.99879
|
64
|
114.55
|
116.94
|
159.24
|
177.71
|
189.42
|
63.655
|
128
|
170.28
|
173.34
|
217
|
220.87
|
306.83
|
126.76
|
256
|
265.46
|
262.12
|
374.34
|
445.03
|
610.07
|
251.12
|
384
|
322.1
|
300.5
|
478.52
|
627.84
|
962.12
|
374.99
|
512
|
437.49
|
385.16
|
733.84
|
1084.8
|
1529.9
|
493.42
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
166.71
|
False
|
32
|
505.29
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
17.854
|
15.285
|
20.322
|
21.213
|
28.771
|
0.99935
|
8
|
24.344
|
22.916
|
30.133
|
38.193
|
53.096
|
7.9913
|
16
|
33.628
|
31.514
|
39.454
|
61.332
|
77.594
|
15.975
|
32
|
51.488
|
51.29
|
60.141
|
99.328
|
125.21
|
31.915
|
48
|
66.051
|
66.906
|
78.255
|
106.77
|
150.16
|
47.831
|
64
|
70.315
|
75.312
|
85.973
|
123.24
|
183.86
|
63.743
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
21.612
|
20.434
|
25.3
|
26.234
|
62.83
|
26.234
|
64
|
95.964
|
87.699
|
181.02
|
183.42
|
188.01
|
189.01
|
128
|
174.07
|
152.57
|
280.41
|
346.19
|
356.8
|
189.01
|
256
|
281.56
|
249.97
|
523.14
|
594.08
|
682.59
|
700.86
|
384
|
392.68
|
336.15
|
758.24
|
870.87
|
1002.9
|
1033.1
|
512
|
540.34
|
437.2
|
1118.3
|
1210.2
|
1351.9
|
1424.3
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
337.94
|
False
|
32
|
3229.5
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
15.8
|
13.4
|
14.7
|
16.6
|
30.2
|
1.0
|
8
|
16.7
|
15.8
|
17.9
|
25.9
|
35.6
|
8.0
|
16
|
21.5
|
18.8
|
26.3
|
43.8
|
51.6
|
16.0
|
32
|
32.7
|
27.4
|
43.8
|
46.7
|
92.5
|
31.9
|
48
|
41.1
|
42.5
|
46.4
|
51.3
|
126.7
|
47.8
|
64
|
44.9
|
45.6
|
50.0
|
57.0
|
158.4
|
63.7
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
17.8
|
16.0
|
22.0
|
24.0
|
74.1
|
1.0
|
64
|
57.7
|
54.7
|
68.1
|
168.0
|
170.5
|
63.7
|
128
|
83.3
|
74.1
|
86.5
|
222.3
|
308.5
|
126.8
|
256
|
130.7
|
113.9
|
137.1
|
380.9
|
582.8
|
251.4
|
384
|
174.0
|
131.5
|
196.4
|
554.3
|
881.9
|
373.1
|
512
|
229.5
|
179.3
|
434.6
|
609.5
|
1222.9
|
494.1
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
309.8
|
32
|
3008.8
160
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
25.1
|
24.2
|
25.2
|
25.5
|
46.1
|
1.0
|
True
|
1
|
28.5
|
25.0
|
39.0
|
39.6
|
80.2
|
1.0
|
False
|
8
|
30.3
|
29.4
|
30.9
|
33.8
|
70.4
|
8.0
|
True
|
8
|
41.6
|
29.9
|
79.5
|
84.7
|
146.5
|
8.0
|
False
|
16
|
36.1
|
34.0
|
36.4
|
59.4
|
96.7
|
16.0
|
True
|
16
|
52.8
|
35.2
|
98.2
|
107.8
|
235.9
|
15.9
|
False
|
32
|
56.2
|
61.4
|
63.7
|
65.0
|
155.9
|
31.8
|
True
|
32
|
70.2
|
62.2
|
143.4
|
159.7
|
324.9
|
31.7
|
False
|
48
|
60.9
|
69.2
|
74.3
|
77.4
|
142.8
|
47.7
|
True
|
48
|
95.6
|
74.2
|
184.8
|
196.4
|
426.6
|
47.3
960
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
62.7
|
60.6
|
65.6
|
71.0
|
124.3
|
1.0
|
True
|
1
|
92.4
|
74.9
|
95.4
|
103.6
|
516.6
|
1.0
|
False
|
64
|
167.4
|
183.9
|
191.6
|
281.4
|
293.9
|
63.4
|
True
|
64
|
268.4
|
293.6
|
351.6
|
467.3
|
515.0
|
63.0
|
False
|
128
|
224.3
|
226.0
|
236.8
|
381.3
|
472.8
|
126.2
|
False
|
256
|
349.9
|
342.1
|
378.2
|
618.6
|
855.6
|
249.3
|
False
|
384
|
498.1
|
470.4
|
748.8
|
1036.3
|
1407.7
|
367.8
|
False
|
512
|
694.0
|
579.4
|
1443.1
|
1465.2
|
2261.6
|
482.6
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
20.9
|
True
|
1
|
17.4
|
False
|
32
|
424.4
|
True
|
32
|
306.9
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
208.9
|
272.85
|
32
|
2210.3
|
745.51
|
64
|
2601
|
810.1
320
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
1.0
|
59.60
|
8
|
7.9
|
122.40
|
16
|
15.8
|
151.56
|
32
|
31.6
|
193.95
|
64
|
63.0
|
235.30
1600
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
1.0
|
64.21
|
64
|
63.4
|
277.25
|
128
|
126.2
|
343.67
|
256
|
250.4
|
503.87
|
384
|
371.6
|
640.38
|
512
|
490.6
|
805.25
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
129.3
|
428.72
|
32
|
1403.7
|
1190.53
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
10
|
9.9
|
11.3
|
12
|
40
|
1
|
8
|
12.6
|
12
|
13.4
|
17
|
31
|
8
|
16
|
17
|
15
|
22
|
25
|
40
|
15.98
|
32
|
23
|
23
|
31
|
33
|
50
|
31.94
|
48
|
29
|
28
|
40
|
41
|
70
|
47.9
|
64
|
33.6
|
38
|
45
|
47
|
70
|
63.9
|
128
|
49
|
47
|
64
|
67
|
150
|
127.6
|
256
|
84
|
75
|
107
|
126
|
391
|
255
800
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
14
|
11
|
20
|
40
|
80
|
1
|
64
|
39
|
40
|
55
|
80
|
110
|
63.9
|
128
|
58
|
50
|
75
|
150
|
202
|
127.6
|
256
|
90
|
80
|
115
|
240
|
380
|
255
|
384
|
120
|
107
|
155
|
316
|
530
|
381.4
|
512
|
149
|
130
|
196
|
400
|
700
|
508
|
768
|
258
|
200
|
630
|
680
|
1280
|
756
|
1024
|
420
|
263
|
1280
|
1350
|
1900
|
992
n-gram
|
# of streams
|
Throughput (RTFX)
|
32
|
467
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
90
|
False
|
32
|
370
none
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
110
|
False
|
32
|
1255
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
17.79
|
15.812
|
22.171
|
22.66
|
24.527
|
0.99925
|
8
|
19.619
|
18.702
|
20.283
|
21.283
|
49.858
|
7.9866
|
16
|
24.347
|
22.816
|
24.601
|
30.805
|
83.174
|
15.958
|
32
|
32.883
|
30.65
|
40.314
|
40.992
|
129.39
|
31.856
|
48
|
43.084
|
44.219
|
50.994
|
56.952
|
210.66
|
47.689
|
64
|
53.643
|
53.416
|
61.031
|
97.948
|
264.43
|
63.476
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
20.624
|
19.301
|
26.012
|
28.626
|
50.408
|
0.99918
|
64
|
73.596
|
71.06
|
84.034
|
234.8
|
251.47
|
63.497
|
128
|
123.56
|
110.62
|
139.56
|
300.04
|
449.06
|
126.28
|
256
|
188.12
|
162.21
|
200.33
|
538
|
814.35
|
249.61
|
384
|
268.43
|
198.76
|
527.84
|
786.32
|
1372.3
|
369.42
|
512
|
405.24
|
287.28
|
1347.4
|
1439.1
|
2252.5
|
486.61
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
318.26
|
False
|
32
|
2085
|
True
|
32
|
125
320
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
25.742
|
25.178
|
29.273
|
30.679
|
40.96
|
0.99891
|
8
|
37.458
|
36.717
|
43.875
|
45.592
|
57.38
|
7.9865
|
16
|
46.788
|
45.738
|
51.555
|
60.965
|
75.74
|
15.963
|
32
|
64.08
|
57.471
|
84.993
|
89.653
|
128.29
|
31.873
|
48
|
85.545
|
96.194
|
111.54
|
117.86
|
176.5
|
47.714
|
64
|
93.02
|
104.95
|
116
|
124.89
|
195.03
|
63.61
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
21.451
|
20.791
|
23.836
|
24.61
|
53.358
|
0.99922
|
64
|
91.55
|
103.46
|
124.84
|
126.52
|
134.28
|
63.575
|
128
|
177.23
|
190.8
|
213.12
|
218.6
|
244.28
|
127.01
|
256
|
279.71
|
279.51
|
358.52
|
371.5
|
449.47
|
252.36
|
384
|
386.16
|
389.76
|
521.21
|
556.27
|
722.57
|
375.28
|
512
|
492.63
|
496.77
|
691.83
|
793.61
|
1101.9
|
494.73
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
338.78
|
False
|
32
|
3041.9
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
41.4
|
41.0
|
43.1
|
43.9
|
65.2
|
1.0
|
8
|
69.5
|
68.9
|
73.8
|
76.8
|
116.3
|
8.0
|
16
|
84.7
|
80.4
|
108.1
|
113.5
|
149.9
|
15.9
|
32
|
138.2
|
147.3
|
172.9
|
180.1
|
232.4
|
31.7
|
48
|
2610.6
|
2456.5
|
4743.8
|
4941.9
|
6120.2
|
41.5
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
32.0
|
32.0
|
33.2
|
34.9
|
35.6
|
1.0
|
64
|
263.5
|
310.4
|
324.9
|
326.4
|
342.2
|
63.4
|
128
|
562.0
|
591.7
|
646.0
|
829.8
|
835.5
|
124.8
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
151.5
|
32
|
599.4
160
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
51.9
|
51.2
|
53.3
|
54.1
|
91.2
|
1.0
|
True
|
1
|
57.0
|
51.5
|
74.3
|
75.6
|
134.2
|
1.0
|
False
|
8
|
78.6
|
77.6
|
82.8
|
84.0
|
144.3
|
8.0
|
True
|
8
|
92.7
|
80.3
|
127.8
|
131.8
|
246.6
|
7.9
|
False
|
16
|
85.0
|
83.8
|
86.4
|
87.3
|
165.2
|
15.9
|
True
|
16
|
107.9
|
85.5
|
161.3
|
164.5
|
350.9
|
15.8
|
False
|
32
|
147.0
|
149.3
|
176.1
|
184.5
|
295.1
|
31.7
|
True
|
32
|
273.1
|
241.1
|
415.1
|
505.6
|
817.4
|
31.2
960
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
74.0
|
73.7
|
75.3
|
77.5
|
94.4
|
1.0
|
True
|
1
|
108.0
|
96.6
|
100.2
|
111.7
|
473.1
|
1.0
|
False
|
64
|
366.4
|
427.7
|
438.1
|
447.2
|
456.1
|
63.1
|
True
|
64
|
541.1
|
604.0
|
658.0
|
803.4
|
833.1
|
62.4
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
18.7
|
False
|
32
|
252.4
|
True
|
32
|
148.7
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
30.948
|
31.181
|
34.543
|
36.056
|
47.975
|
0.99827
|
8
|
47.991
|
48.392
|
53.894
|
55.871
|
87.543
|
7.978
|
16
|
61.284
|
61.356
|
68.72
|
76.258
|
118.18
|
15.923
|
32
|
75.633
|
74.065
|
95.78
|
102.73
|
155.86
|
31.809
|
48
|
91.854
|
99.673
|
111.41
|
113.95
|
255.06
|
47.621
|
64
|
114.38
|
126.17
|
135.55
|
139.42
|
321.86
|
63.361
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
26.371
|
26.905
|
29.871
|
34.808
|
35.852
|
0.9987
|
64
|
116.95
|
132.38
|
156.75
|
165.34
|
170.12
|
63.681
|
128
|
227.25
|
232.36
|
279.57
|
295.28
|
372.04
|
126.56
|
256
|
351
|
363.49
|
448.89
|
506.45
|
769.04
|
249.55
|
384
|
451.4
|
451.33
|
622.64
|
676.79
|
935.07
|
372.85
|
512
|
579.83
|
578.14
|
838.03
|
1041.4
|
1447.4
|
489.7
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
213.96
|
32
|
1021
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
25.898
|
25.159
|
29.92
|
31.529
|
38.217
|
0.99898
|
8
|
39.652
|
38.118
|
48.426
|
56.406
|
75.964
|
7.986
|
16
|
51.214
|
48.284
|
58.617
|
78.85
|
103.47
|
15.963
|
32
|
62.971
|
65.896
|
79.542
|
102.56
|
123.49
|
31.882
|
48
|
82.938
|
89.665
|
102.51
|
151.8
|
176.93
|
47.767
|
64
|
107.35
|
113.37
|
128.69
|
197.71
|
246.97
|
63.598
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
23.143
|
22.251
|
26.383
|
26.517
|
62.112
|
0.9993
|
64
|
123.91
|
120.69
|
209.79
|
221.48
|
222.74
|
63.746
|
128
|
250.18
|
224.35
|
404.34
|
470.45
|
495.53
|
127
|
256
|
424.4
|
391.15
|
726.09
|
848.57
|
940.01
|
250.66
|
384
|
644.41
|
550.55
|
1274
|
1344.7
|
1493.9
|
369.39
|
512
|
882.82
|
803.01
|
1746.6
|
1874.4
|
1955.3
|
486.89
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
299.83
|
32
|
2790.2
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
24.0
|
22.5
|
27.8
|
29.0
|
38.9
|
1.0
|
8
|
30.5
|
29.0
|
30.3
|
50.9
|
70.2
|
8.0
|
16
|
37.8
|
35.0
|
38.0
|
54.7
|
104.2
|
15.9
|
32
|
48.1
|
51.4
|
61.6
|
71.8
|
141.3
|
31.8
|
48
|
63.8
|
69.2
|
77.3
|
104.0
|
205.1
|
47.6
|
64
|
85.9
|
85.0
|
100.8
|
147.2
|
313.0
|
63.4
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
26.0
|
24.0
|
30.5
|
34.0
|
74.3
|
1.0
|
64
|
103.5
|
101.6
|
125.2
|
269.1
|
296.7
|
63.5
|
128
|
179.6
|
175.2
|
196.0
|
383.6
|
513.2
|
126.0
|
256
|
306.6
|
308.1
|
367.3
|
724.0
|
988.7
|
248.3
|
384
|
535.5
|
393.4
|
1469.1
|
1642.0
|
2496.4
|
365.0
|
512
|
1126.3
|
551.7
|
3230.1
|
3967.6
|
4614.8
|
476.8
|
512
|
1134.3
|
571.6
|
3422.9
|
3841.8
|
4632.6
|
476.7
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
211.3
|
32
|
1395.8
160
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
29.5
|
29.1
|
30.2
|
30.5
|
56.1
|
1.0
|
True
|
1
|
36.6
|
30.6
|
53.6
|
54.4
|
109.9
|
1.0
|
False
|
8
|
39.5
|
38.3
|
40.8
|
42.2
|
96.1
|
8.0
|
True
|
8
|
52.3
|
39.3
|
92.5
|
94.6
|
180.2
|
8.0
|
False
|
16
|
51.8
|
40.7
|
72.4
|
74.5
|
118.5
|
15.9
|
True
|
16
|
67.3
|
47.9
|
114.3
|
116.3
|
301.0
|
15.9
|
False
|
32
|
64.0
|
49.5
|
84.4
|
86.1
|
161.2
|
31.8
|
True
|
32
|
105.6
|
90.6
|
208.2
|
212.1
|
487.5
|
31.5
960
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
66.1
|
65.6
|
66.9
|
72.3
|
73.0
|
1.0
|
True
|
1
|
92.3
|
91.5
|
92.7
|
104.1
|
104.9
|
1.0
|
False
|
64
|
207.2
|
227.4
|
242.7
|
387.8
|
401.4
|
63.2
|
True
|
64
|
363.7
|
397.5
|
435.2
|
653.1
|
670.7
|
62.8
|
False
|
128
|
294.3
|
299.9
|
312.1
|
525.3
|
658.5
|
125.5
|
False
|
256
|
518.9
|
504.9
|
724.3
|
1018.2
|
1668.4
|
245.6
|
False
|
384
|
867.2
|
683.3
|
2002.2
|
2262.0
|
3026.8
|
359.8
|
False
|
512
|
2194.6
|
2014.4
|
4142.6
|
4819.2
|
5894.0
|
443.7
|
False
|
512
|
2176.1
|
1993.4
|
4113.3
|
4797.2
|
5879.7
|
443.8
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
19.2
|
True
|
1
|
17.7
|
False
|
32
|
341.0
|
True
|
32
|
178.0
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
128.5
|
433.0
|
32
|
1326.0
|
1268.77
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
13
|
11.8
|
12.8
|
14
|
40
|
1
|
8
|
17.6
|
16.8
|
18.5
|
22
|
39
|
8
|
16
|
22.5
|
21.3
|
25
|
31
|
60.3
|
15.98
|
32
|
32.4
|
35
|
42
|
46
|
70
|
31.93
|
48
|
41
|
40
|
58
|
59
|
100
|
47.9
|
64
|
46
|
50
|
64
|
66
|
100
|
63.8
|
128
|
73
|
66
|
94
|
97
|
220
|
127.5
800
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
16
|
13
|
20
|
40
|
80
|
1
|
64
|
60
|
60
|
80
|
110
|
180
|
63.8
|
128
|
90
|
80
|
110
|
230
|
300
|
127.5
|
256
|
133.3
|
120
|
174
|
340
|
530
|
254
|
384
|
183
|
166
|
245
|
430
|
800
|
380
|
512
|
260
|
223
|
510
|
600
|
1200
|
505
|
768
|
535
|
354
|
1500
|
1640
|
2150
|
739
|
1024
|
940
|
600
|
2300
|
2570
|
2930
|
960
n-gram
|
# of streams
|
Throughput (RTFX)
|
32
|
460
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
60
|
False
|
32
|
234
none
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
108.04
|
False
|
32
|
827.71
320
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
1.0
|
99.63
|
8
|
7.9
|
138.54
|
16
|
15.7
|
203.51
|
32
|
31.4
|
303.27
|
48
|
39.8
|
2991.17
|
64
|
50.8
|
3737.57
1600
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
1.0
|
102.40
|
64
|
62.9
|
490.66
|
128
|
124.5
|
682.94
|
256
|
244.3
|
1008.00
|
384
|
313.3
|
3766.07
|
512
|
318.4
|
9788.07
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
77.1
|
712.47
|
32
|
838.0
|
2027.50
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
11.518
|
10.501
|
11.753
|
12.329
|
29.332
|
0.99948
|
8
|
13.042
|
12.727
|
14.303
|
16.54
|
27.45
|
7.9934
|
16
|
17.579
|
16.357
|
25.071
|
26.493
|
42.529
|
15.974
|
32
|
21.415
|
18.903
|
27.705
|
28.62
|
65.338
|
31.924
|
48
|
32.285
|
32.166
|
34.611
|
35.804
|
102.55
|
47.839
|
64
|
33.933
|
36.076
|
39.682
|
41.26
|
120.46
|
63.75
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
14.345
|
12.899
|
18.496
|
21.621
|
49.489
|
0.99941
|
64
|
43.724
|
41.908
|
48.17
|
138.95
|
140.61
|
63.715
|
128
|
76.158
|
69.027
|
80.239
|
198.79
|
277.37
|
126.88
|
256
|
113.72
|
89.307
|
128.73
|
294.93
|
488.96
|
251.9
|
384
|
150.8
|
133.5
|
170.69
|
465.93
|
722.34
|
374.76
|
512
|
198.83
|
173.53
|
280.75
|
577.5
|
975.18
|
495.82
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
365.61
|
False
|
32
|
3638
|
True
|
32
|
101.5
320
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
13.377
|
12.014
|
14.8
|
15.721
|
17.616
|
0.99944
|
8
|
20.362
|
19.784
|
23.525
|
26.147
|
33.041
|
7.9919
|
16
|
28.97
|
28.08
|
34.588
|
37.939
|
52.757
|
15.97
|
32
|
42.96
|
38.11
|
55.592
|
57.928
|
94.558
|
31.904
|
48
|
58.84
|
67.281
|
75.958
|
77.311
|
136.62
|
47.794
|
64
|
79.065
|
88.762
|
99.511
|
109.32
|
181.85
|
63.634
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
17.358
|
16.19
|
19.135
|
23.812
|
52.552
|
0.99925
|
64
|
86.236
|
102.67
|
110.26
|
112.32
|
119.81
|
63.754
|
128
|
204.03
|
205.92
|
220.27
|
223.14
|
250.96
|
126.93
|
256
|
315.08
|
321.68
|
395.18
|
408.04
|
502.56
|
251.93
|
384
|
423.9
|
421.51
|
577.25
|
664
|
857.63
|
373.82
|
512
|
573
|
563.93
|
874.58
|
1039.3
|
1263.3
|
492.08
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
335.32
|
False
|
32
|
2876.4
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
21.6
|
21.1
|
22.8
|
23.4
|
32.5
|
1.0
|
8
|
31.5
|
31.2
|
33.8
|
35.2
|
51.5
|
8.0
|
16
|
45.5
|
45.4
|
47.8
|
53.5
|
79.2
|
16.0
|
32
|
67.4
|
61.8
|
89.0
|
90.6
|
119.4
|
31.8
|
48
|
98.9
|
116.6
|
127.1
|
134.2
|
182.2
|
47.6
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
21.1
|
20.3
|
23.0
|
24.7
|
49.2
|
1.0
|
64
|
161.7
|
197.5
|
204.8
|
208.3
|
212.7
|
63.6
|
128
|
369.0
|
396.0
|
432.2
|
450.6
|
455.8
|
126.2
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
264.5
|
32
|
882.6
160
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
51.9
|
51.2
|
53.3
|
54.1
|
91.2
|
1.0
|
True
|
1
|
57.0
|
51.5
|
74.3
|
75.6
|
134.2
|
1.0
|
False
|
8
|
78.6
|
77.6
|
82.8
|
84.0
|
144.3
|
8.0
|
True
|
8
|
92.7
|
80.3
|
127.8
|
131.8
|
246.6
|
7.9
|
False
|
16
|
85.0
|
83.8
|
86.4
|
87.3
|
165.2
|
15.9
|
True
|
16
|
107.9
|
85.5
|
161.3
|
164.5
|
350.9
|
15.8
|
False
|
32
|
147.0
|
149.3
|
176.1
|
184.5
|
295.1
|
31.7
|
True
|
32
|
273.1
|
241.1
|
415.1
|
505.6
|
817.4
|
31.2
960
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
59.8
|
59.2
|
61.8
|
63.2
|
92.3
|
1.0
|
True
|
1
|
85.1
|
72.2
|
76.2
|
83.5
|
514.8
|
1.0
|
False
|
64
|
255.4
|
304.4
|
310.3
|
313.2
|
315.0
|
63.4
|
True
|
64
|
372.3
|
422.1
|
463.4
|
469.2
|
471.6
|
63.1
|
False
|
128
|
478.7
|
513.0
|
528.6
|
666.1
|
686.7
|
125.3
|
True
|
128
|
687.7
|
695.9
|
776.2
|
1113.9
|
1620.6
|
123.7
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
20.7
|
False
|
32
|
336.4
|
True
|
32
|
260.5
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
19.671
|
20.104
|
21.685
|
21.96
|
38.539
|
0.99884
|
8
|
31.194
|
31.719
|
35.482
|
36.195
|
66.154
|
7.9835
|
16
|
45.007
|
46.339
|
50.019
|
51.456
|
92.321
|
15.953
|
32
|
61.018
|
56.473
|
77.184
|
79.764
|
136.4
|
31.801
|
48
|
79.726
|
87.697
|
98.868
|
100.64
|
172.36
|
47.647
|
64
|
102.58
|
117.05
|
125.58
|
130.69
|
271.66
|
63.453
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
24.776
|
24.027
|
30.257
|
32.001
|
63.361
|
0.99901
|
64
|
114.75
|
133.68
|
149.83
|
153.73
|
157.22
|
63.679
|
128
|
235.08
|
244.42
|
285.45
|
290.93
|
374.07
|
126.5
|
256
|
367.18
|
365.85
|
468.17
|
506.79
|
691.56
|
250.49
|
384
|
485.58
|
465.62
|
668.72
|
772.08
|
999.89
|
371.5
|
512
|
637.89
|
635.49
|
970.62
|
1132.4
|
1399.5
|
489.51
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
180.37
|
32
|
1037.8
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
14.202
|
13.052
|
17.88
|
19.466
|
25.956
|
0.99947
|
8
|
24.437
|
22.858
|
29.938
|
38.885
|
49.618
|
7.9913
|
16
|
36.536
|
34.467
|
44.059
|
54.708
|
81.838
|
15.973
|
32
|
48.218
|
52.974
|
64.388
|
81.382
|
99.757
|
31.904
|
48
|
69.337
|
78.074
|
89.509
|
110.77
|
143.67
|
47.784
|
64
|
92.235
|
101.77
|
114.86
|
181.18
|
204.97
|
63.62
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
19.475
|
17.506
|
22.395
|
23.533
|
78.759
|
0.99942
|
64
|
120.23
|
118.25
|
207.22
|
215.26
|
240.07
|
63.754
|
128
|
246.07
|
227.94
|
373.71
|
448.51
|
456.11
|
126.96
|
256
|
426
|
422.2
|
724.17
|
801.8
|
895.13
|
251.23
|
384
|
572.76
|
545.31
|
976.51
|
1096.7
|
1214.7
|
372.22
|
512
|
781.89
|
704.9
|
1342
|
1422.6
|
1613.6
|
491.52
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
349.89
|
32
|
2885.7
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
15.3
|
14.0
|
15.3
|
16.8
|
34.7
|
1.0
|
8
|
25.1
|
21.8
|
34.5
|
35.2
|
44.5
|
8.0
|
16
|
20.2
|
19.0
|
22.5
|
39.6
|
46.3
|
16.0
|
32
|
30.6
|
24.2
|
39.2
|
43.4
|
75.1
|
31.9
|
48
|
38.1
|
40.8
|
45.1
|
54.8
|
94.4
|
47.8
|
64
|
57.1
|
55.5
|
59.0
|
60.5
|
166.6
|
63.6
960
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
18.0
|
16.2
|
21.2
|
24.7
|
69.8
|
1.0
|
64
|
62.4
|
63.0
|
71.4
|
156.6
|
158.2
|
63.7
|
128
|
109.1
|
105.9
|
117.6
|
229.6
|
306.7
|
126.8
|
256
|
171.7
|
147.5
|
202.2
|
405.7
|
578.8
|
251.3
|
384
|
227.2
|
198.5
|
287.6
|
570.6
|
826.4
|
373.8
|
512
|
319.9
|
269.6
|
632.4
|
829.8
|
1471.6
|
492.6
n-gram
|
# of streams
|
Throughput (RTFX)
|
1
|
293.5
|
32
|
2602.0
160
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
21.8
|
21.2
|
22.1
|
22.4
|
39.6
|
1.0
|
True
|
1
|
25.6
|
22.5
|
35.0
|
35.5
|
71.3
|
1.0
|
False
|
8
|
27.5
|
26.9
|
28.2
|
29.7
|
61.9
|
8.0
|
True
|
8
|
34.7
|
28.2
|
48.4
|
50.4
|
120.4
|
8.0
|
False
|
16
|
36.9
|
35.1
|
36.6
|
57.6
|
97.4
|
15.9
|
True
|
16
|
55.2
|
56.0
|
82.7
|
84.6
|
193.7
|
15.9
|
False
|
32
|
51.6
|
39.9
|
65.7
|
68.2
|
131.3
|
31.8
|
True
|
32
|
71.6
|
64.6
|
146.0
|
150.3
|
303.5
|
31.7
|
False
|
48
|
68.0
|
76.7
|
85.8
|
92.2
|
168.5
|
47.7
|
True
|
48
|
101.9
|
83.0
|
178.4
|
189.3
|
479.8
|
47.3
960
n-gram
|
Speaker Diarization
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
False
|
1
|
55.8
|
55.5
|
56.3
|
61.4
|
61.8
|
1.0
|
True
|
1
|
70.0
|
69.2
|
70.6
|
79.2
|
80.0
|
1.0
|
False
|
64
|
175.2
|
194.2
|
197.8
|
273.0
|
312.9
|
63.5
|
True
|
64
|
263.9
|
292.5
|
313.6
|
458.7
|
465.9
|
63.1
|
False
|
128
|
252.2
|
262.2
|
273.2
|
397.9
|
471.7
|
126.1
|
False
|
256
|
438.2
|
419.6
|
492.1
|
819.3
|
1027.5
|
248.0
|
False
|
384
|
759.5
|
626.9
|
1600.5
|
1968.1
|
2799.9
|
364.6
|
False
|
512
|
2054.4
|
1823.7
|
3943.4
|
4720.0
|
5667.5
|
456.1
|
False
|
512
|
2015.4
|
1795.2
|
3924.8
|
4561.5
|
5507.2
|
457.1
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
21.9
|
True
|
1
|
19.9
|
False
|
32
|
420.4
|
True
|
32
|
308.5
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
False
|
1
|
158.3
|
352.24
|
False
|
32
|
1631.3
|
1018.84
160
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
11
|
10.3
|
11.2
|
12.4
|
30
|
1
|
8
|
20
|
19
|
26
|
30
|
42
|
7.99
|
16
|
28
|
26
|
35
|
40
|
56
|
15.97
|
32
|
35
|
35
|
48
|
52
|
73
|
31.9
|
64
|
50
|
55
|
66
|
70
|
100
|
63.8
800
n-gram
|
# of streams
|
Latency (ms)
|
Throughput (RTFX)
|
avg
|
p50
|
p90
|
p95
|
p99
|
1
|
14
|
11.5
|
20
|
30
|
60
|
1
|
64
|
70
|
70
|
90
|
100
|
170
|
63.8
|
128
|
88
|
84
|
110
|
190
|
250
|
127.4
|
256
|
128
|
117
|
164
|
300
|
460
|
254.4
n-gram
|
# of streams
|
Throughput (RTFX)
|
32
|
440
n-gram
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
70
|
False
|
32
|
193.5
none
|
Speaker Diarization
|
# of streams
|
Throughput (RTFX)
|
False
|
1
|
6.2
|
False
|
32
|
43.3
320
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
1.0
|
86.53
|
8
|
7.9
|
136.34
|
16
|
15.8
|
163.55
|
32
|
31.4
|
253.70
|
48
|
44.8
|
991.17
|
64
|
58.9
|
1180.73
1600
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
1.0
|
87.29
|
64
|
63.0
|
433.03
|
128
|
125.0
|
586.62
|
256
|
246.3
|
836.96
|
384
|
337.1
|
2274.47
|
512
|
342.7
|
7912.27
n-gram
|
# of streams
|
Throughput (RTFX)
|
Average Latency (ms)
|
1
|
85.2
|
642.87
|
32
|
1056.6
|
1606.57
On-Prem Hardware Specifications#
|
GPU
|
NVIDIA DGX A100 40GB
|
CPU
|
Model
|
AMD EPYC 7742 64-Core Processor
|
Thread(s) per core
|
2
|
Socket(s)
|
2
|
Core(s) per socket
|
64
|
NUMA node(s)
|
8
|
Frequency boost
|
enabled
|
CPU max MHz
|
2250
|
CPU min MHz
|
1500
|
RAM
|
Model
|
Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz
|
Configured Memory Speed
|
2933 MT/s
|
RAM Size
|
32x64GB (2048GB Total)
|
GPU
|
NVIDIA H100 80GB HBM3
|
CPU
|
Model
|
Intel(R) Xeon(R) Platinum 8480CL
|
Thread(s) per core
|
2
|
Socket(s)
|
2
|
Core(s) per socket
|
56
|
NUMA node(s)
|
2
|
CPU max MHz
|
3800
|
CPU min MHz
|
800
|
RAM
|
Model
|
Micron DDR5 MTC40F2046S1RC48BA1 4800MHz
|
Configured Memory Speed
|
4400 MT/s
|
RAM Size
|
32x64GB (2048GB Total)
|
GPU
|
NVIDIA L40
|
CPU
|
Model
|
AMD EPYC 7763 64-Core Processor
|
Thread(s) per core
|
1
|
Socket(s)
|
2
|
Core(s) per socket
|
64
|
NUMA node(s)
|
8
|
Frequency boost
|
enabled
|
CPU max MHz
|
3529
|
CPU min MHz
|
1500
|
RAM
|
Model
|
Samsung DDR4 M393A4K40DB3-CWE 3200MHz
|
Configured Memory Speed
|
3200 MT/s
|
RAM Size
|
16x32GB (512GB Total)
Model Accuracy#
Riva ASR models are evaluated using Word Error Rate (WER) for word-based languages such as English, Spanish, and French, and Character Error Rate (CER) for character-based languages such as Chinese, Japanese, and Mandarin. For Diarization, Concatenated minimum-Permutation Word Error Rate (cpWER) is used.
WER measures the minimum number of word substitutions, insertions, and deletions required to transform the model’s output into the reference transcript, divided by the total number of words in the reference. Similarly, CER calculates the minimum number of character edits needed, divided by the total number of characters in the reference. cpWER is calculated as follows:
Concatenate all utterances of each speaker for both reference and hypothesis files.
Compute the WER between the reference and all possible speaker permutations of the hypothesis.
Pick the lowest WER among them (this is assumed to be the best permutation).
Lower WER/CER values indicate better accuracy, with 0% representing perfect transcription.
|
Model Name
|
Language
|
Dataset
|
Best latency WER (%) ⬇️
|
Best throughput WER (%) ⬇️
|
Offline WER (%) ⬇️
|
Parakeet 1.1b CTC
|
en-US
|
10.45
|
8.80
|
7.96
|
en-US
|
6.34
|
4.74
|
4.09
|
en-US
|
46.09
|
41.35
|
39.61
|
en-US (Silero VAD)
|
5.57
|
4.8
|
4.5
|
en-US (Telephony)
|
7.33
|
5.11
|
4.17
|
en-US (Telephony)
|
30.13
|
27.82
|
28.91
|
en-US (Telephony) + Sortformer Diarizer
|
28.43 (cpWER)
|
-
|
-
|
Parakeet 0.6b TDT
|
en-US
|
-
|
-
|
11.46
|
en-US
|
-
|
-
|
11.65
|
en-US
|
-
|
-
|
9.15
|
en-US
|
-
|
-
|
2.01
|
en-US
|
-
|
-
|
3.51
|
en-US
|
-
|
-
|
2.16
|
en-US
|
-
|
-
|
3.38
|
en-US
|
-
|
-
|
6.6
|
Parakeet 1.1b RNNT
|
en-US
|
10.74
|
10.54
|
9.77
|
es-US
|
7.19
|
5.26
|
3.83
|
es-ES
|
16.15
|
14.42
|
11.51
|
fr-FR
|
11.41
|
9.10
|
6.36
|
de-DE
|
11.29
|
9.16
|
7.09
|
ru-RU
|
21.44
|
19.23
|
17.39
|
Parakeet 0.6b CTC
|
en-US
|
10.57
|
8.87
|
8.45
|
Parakeet 0.6b CTC
|
vi-VN
|
10
|
8.58
|
7.97
|
Parakeet 0.6b CTC
|
zh-CN
|
5.81
|
5.84
|
6.09
|
Parakeet 0.6b CTC
|
es-US
|
9.14
|
6.15
|
5.34
|
Canary 1b
|
en-US
|
Not supported
|
Not supported
|
6.78
|
es-US
|
Not supported
|
Not supported
|
3.54
|
de-DE
|
Not supported
|
Not supported
|
5.18
|
fr-FR
|
Not supported
|
Not supported
|
4.21
|
ru-RU
|
Not supported
|
Not supported
|
10.33
|
es-ES
|
Not supported
|
Not supported
|
14.40
|
pt-BR
|
Not supported
|
Not supported
|
5.83
|
Conformer 120m CTC
|
es-US
|
6.75
|
6.26
|
5.66