ASR NIM Performance#
This page provides latency and throughput benchmarks for the NVIDIA ASR NIM microservice in streaming and offline configurations across supported GPUs.
Evaluation Process#
The performance evaluation is done for the two different inference modes: streaming and offline.
Streaming Mode#
Streaming benchmarks use the riva_streaming_asr_client with the --simulate_realtime flag to simulate real-time microphone transcription. The client and server process audio chunks of the same duration. Each stream performs three iterations over a sample audio file (1272-135031-0000.wav) from the LibriSpeech dev-clean dataset. Refer to the Results section for chunk size values.
The source code for riva_streaming_asr_client is available at Riva C++ Clients.
The following command measures streaming performance:
riva_streaming_asr_client \
--chunk_duration_ms=<chunk_duration> \
--simulate_realtime=true \
--automatic_punctuation=true \
--num_parallel_requests=<num_streams> \
--word_time_offsets=false \
--print_transcripts=false \
--interim_results=false \
--num_iterations=<3*num_streams> \
--audio_file=1272-135031-0000.wav \
--output_filename=/tmp/output.json
The riva_streaming_asr_client returns three latency measurements:
intermediate latency: Latency of responses withis_final == false.final latency: Latency of responses withis_final == true.latency: Overall latency of all responses. This value is reported in the results tables.
The following diagram shows how these latencies are measured:
Offline Mode#
The following command measures maximum throughput in offline mode:
riva_asr_client \
--automatic_punctuation=true \
--num_parallel_requests=32 \
--word_time_offsets=false \
--print_transcripts=false \
--num_iterations=96 \
--audio_file=1272-135031-0000x5.wav \
--output_filename=/tmp/output.json
The file 1272-135031-0000x5.wav is the 1272-135031-0000.wav audio file concatenated five times. The source code for riva_asr_client is available at Riva C++ Clients.
Results#
The following tables report latency and throughput for streaming and offline configurations. Throughput is measured in RTFX (duration of audio transcribed divided by computation time).
Note
All values are averages over three trials, rounded to the last significant digit based on standard deviation. If a standard deviation is less than 0.001 of the average, the value is rounded as if the standard deviation equals 0.001 of the average.
For the hardware used in these measurements, refer to the Hardware Specifications section.
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
12.439 |
10.388 |
11.524 |
12.54 |
30.242 |
0.99949 |
8 |
13.006 |
12.508 |
14.673 |
17.203 |
29.539 |
7.9929 |
16 |
18.138 |
17.06 |
24.885 |
27.922 |
49.825 |
15.975 |
32 |
23.093 |
20.141 |
29.905 |
30.991 |
76.264 |
31.915 |
48 |
28.666 |
29.63 |
33.027 |
34.111 |
101.76 |
47.834 |
64 |
32.012 |
32.449 |
35.855 |
37.779 |
136.06 |
63.719 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14.088 |
12.616 |
17.525 |
20.743 |
51.894 |
0.9994 |
64 |
42.41 |
37.123 |
43.153 |
157.71 |
163.47 |
63.68 |
128 |
61.41 |
49.455 |
61.86 |
197.73 |
307.18 |
126.82 |
256 |
93.439 |
67.938 |
98.617 |
315.2 |
558.71 |
251.39 |
384 |
123.79 |
93.576 |
124.35 |
472.36 |
848.59 |
373.93 |
512 |
166.85 |
117.95 |
318.19 |
615.9 |
1141.7 |
494.12 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
353.62 |
False |
32 |
3707.4 |
True |
32 |
170 |
320n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
16.946 |
14.233 |
17.459 |
18.954 |
21.387 |
0.99936 |
8 |
19.451 |
18.632 |
22.736 |
26.336 |
32.004 |
7.9924 |
16 |
24.811 |
23.88 |
28.443 |
32.162 |
42.753 |
15.978 |
32 |
33.166 |
30.537 |
43.692 |
47.19 |
68.485 |
31.929 |
48 |
44.522 |
49.317 |
57.667 |
60.352 |
93.513 |
47.855 |
64 |
55.794 |
60.906 |
71.632 |
74.103 |
117.78 |
63.755 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.719 |
19.153 |
22.404 |
25.33 |
53.192 |
0.99929 |
64 |
67.47 |
73.823 |
84.609 |
89.695 |
91.372 |
63.803 |
128 |
116.36 |
122.78 |
146.72 |
152.17 |
173.31 |
127.31 |
256 |
174.21 |
179.51 |
223.23 |
242.4 |
270.84 |
253.54 |
384 |
225.73 |
208.54 |
317.68 |
323.05 |
345.72 |
379.42 |
512 |
281.13 |
299.92 |
406.08 |
416.05 |
517.49 |
503.1 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
335.83 |
False |
32 |
3890.1 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
27.7 |
26.0 |
27.7 |
28.5 |
42.8 |
1.0 |
8 |
33.1 |
32.9 |
35.1 |
35.9 |
54.1 |
8.0 |
16 |
43.4 |
42.7 |
45.7 |
57.2 |
73.9 |
16.0 |
32 |
59.7 |
48.4 |
78.1 |
80.1 |
105.4 |
31.9 |
48 |
99.1 |
106.5 |
110.5 |
112.2 |
187.5 |
47.7 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
23.7 |
22.9 |
26.3 |
27.6 |
50.4 |
1.0 |
64 |
135.7 |
160.3 |
167.9 |
171.1 |
174.8 |
63.7 |
128 |
273.6 |
300.2 |
314.6 |
319.5 |
344.2 |
126.8 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
224.8 |
False |
32 |
1223.7 |
160n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
34.3 |
33.8 |
35.5 |
36.1 |
61.2 |
1.0 |
True |
1 |
38.1 |
34.8 |
48.5 |
50.0 |
93.0 |
1.0 |
False |
8 |
41.2 |
40.7 |
43.1 |
43.9 |
76.4 |
8.0 |
True |
8 |
53.6 |
41.4 |
89.8 |
96.9 |
165.4 |
8.0 |
False |
16 |
52.4 |
51.1 |
53.9 |
70.7 |
104.1 |
15.9 |
True |
16 |
70.9 |
51.6 |
114.8 |
129.8 |
257.8 |
15.9 |
False |
32 |
78.4 |
64.4 |
102.5 |
105.8 |
145.8 |
31.8 |
True |
32 |
115.5 |
102.1 |
201.6 |
217.2 |
394.5 |
31.6 |
False |
48 |
105.5 |
124.7 |
132.7 |
136.8 |
174.0 |
47.6 |
True |
48 |
169.7 |
141.4 |
258.0 |
287.1 |
518.0 |
47.3 |
960n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
62.0 |
61.3 |
64.7 |
65.6 |
92.2 |
1.0 |
True |
1 |
93.9 |
76.6 |
100.4 |
101.0 |
538.6 |
1.0 |
False |
64 |
230.0 |
269.0 |
275.3 |
278.4 |
280.1 |
63.4 |
True |
64 |
388.3 |
425.9 |
495.3 |
510.8 |
525.9 |
63.2 |
False |
128 |
366.7 |
398.8 |
416.5 |
429.6 |
446.6 |
126.2 |
True |
128 |
600.0 |
621.0 |
644.7 |
900.0 |
957.3 |
124.4 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
21.0 |
False |
32 |
357.2 |
True |
32 |
270.1 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
23.02 |
23.176 |
28.694 |
29.829 |
44.012 |
0.99863 |
8 |
34.593 |
33.966 |
40.212 |
45.607 |
94.882 |
7.9646 |
16 |
42.333 |
41.683 |
50.64 |
58.927 |
93.028 |
15.953 |
32 |
55.452 |
51.111 |
76.248 |
82.525 |
129.15 |
31.828 |
48 |
72.80 |
175.236 |
94.222 |
106.1 |
223.45 |
47.592 |
64 |
97.943 |
100.13 |
116.06 |
126.09 |
240.68 |
63.512 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
29.309 |
29.149 |
36.643 |
41.075 |
47.067 |
0.99879 |
64 |
114.55 |
116.94 |
159.24 |
177.71 |
189.42 |
63.655 |
128 |
170.28 |
173.34 |
217 |
220.87 |
306.83 |
126.76 |
256 |
265.46 |
262.12 |
374.34 |
445.03 |
610.07 |
251.12 |
384 |
322.1 |
300.5 |
478.52 |
627.84 |
962.12 |
374.99 |
512 |
437.49 |
385.16 |
733.84 |
1084.8 |
1529.9 |
493.42 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
166.71 |
False |
32 |
505.29 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.854 |
15.285 |
20.322 |
21.213 |
28.771 |
0.99935 |
8 |
24.344 |
22.916 |
30.133 |
38.193 |
53.096 |
7.9913 |
16 |
33.628 |
31.514 |
39.454 |
61.332 |
77.594 |
15.975 |
32 |
51.488 |
51.29 |
60.141 |
99.328 |
125.21 |
31.915 |
48 |
66.051 |
66.906 |
78.255 |
106.77 |
150.16 |
47.831 |
64 |
70.315 |
75.312 |
85.973 |
123.24 |
183.86 |
63.743 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
21.612 |
20.434 |
25.3 |
26.234 |
62.83 |
26.234 |
64 |
95.964 |
87.699 |
181.02 |
183.42 |
188.01 |
189.01 |
128 |
174.07 |
152.57 |
280.41 |
346.19 |
356.8 |
189.01 |
256 |
281.56 |
249.97 |
523.14 |
594.08 |
682.59 |
700.86 |
384 |
392.68 |
336.15 |
758.24 |
870.87 |
1002.9 |
1033.1 |
512 |
540.34 |
437.2 |
1118.3 |
1210.2 |
1351.9 |
1424.3 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
337.94 |
False |
32 |
3229.5 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
15.8 |
13.4 |
14.7 |
16.6 |
30.2 |
1.0 |
8 |
16.7 |
15.8 |
17.9 |
25.9 |
35.6 |
8.0 |
16 |
21.5 |
18.8 |
26.3 |
43.8 |
51.6 |
16.0 |
32 |
32.7 |
27.4 |
43.8 |
46.7 |
92.5 |
31.9 |
48 |
41.1 |
42.5 |
46.4 |
51.3 |
126.7 |
47.8 |
64 |
44.9 |
45.6 |
50.0 |
57.0 |
158.4 |
63.7 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.8 |
16.0 |
22.0 |
24.0 |
74.1 |
1.0 |
64 |
57.7 |
54.7 |
68.1 |
168.0 |
170.5 |
63.7 |
128 |
83.3 |
74.1 |
86.5 |
222.3 |
308.5 |
126.8 |
256 |
130.7 |
113.9 |
137.1 |
380.9 |
582.8 |
251.4 |
384 |
174.0 |
131.5 |
196.4 |
554.3 |
881.9 |
373.1 |
512 |
229.5 |
179.3 |
434.6 |
609.5 |
1222.9 |
494.1 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
309.8 |
32 |
3008.8 |
160n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
25.1 |
24.2 |
25.2 |
25.5 |
46.1 |
1.0 |
True |
1 |
28.5 |
25.0 |
39.0 |
39.6 |
80.2 |
1.0 |
False |
8 |
30.3 |
29.4 |
30.9 |
33.8 |
70.4 |
8.0 |
True |
8 |
41.6 |
29.9 |
79.5 |
84.7 |
146.5 |
8.0 |
False |
16 |
36.1 |
34.0 |
36.4 |
59.4 |
96.7 |
16.0 |
True |
16 |
52.8 |
35.2 |
98.2 |
107.8 |
235.9 |
15.9 |
False |
32 |
56.2 |
61.4 |
63.7 |
65.0 |
155.9 |
31.8 |
True |
32 |
70.2 |
62.2 |
143.4 |
159.7 |
324.9 |
31.7 |
False |
48 |
60.9 |
69.2 |
74.3 |
77.4 |
142.8 |
47.7 |
True |
48 |
95.6 |
74.2 |
184.8 |
196.4 |
426.6 |
47.3 |
960n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
62.7 |
60.6 |
65.6 |
71.0 |
124.3 |
1.0 |
True |
1 |
92.4 |
74.9 |
95.4 |
103.6 |
516.6 |
1.0 |
False |
64 |
167.4 |
183.9 |
191.6 |
281.4 |
293.9 |
63.4 |
True |
64 |
268.4 |
293.6 |
351.6 |
467.3 |
515.0 |
63.0 |
False |
128 |
224.3 |
226.0 |
236.8 |
381.3 |
472.8 |
126.2 |
False |
256 |
349.9 |
342.1 |
378.2 |
618.6 |
855.6 |
249.3 |
False |
384 |
498.1 |
470.4 |
748.8 |
1036.3 |
1407.7 |
367.8 |
False |
512 |
694.0 |
579.4 |
1443.1 |
1465.2 |
2261.6 |
482.6 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
20.9 |
True |
1 |
17.4 |
False |
32 |
424.4 |
True |
32 |
306.9 |
# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
208.9 |
272.85 |
32 |
2210.3 |
745.51 |
64 |
2601 |
810.1 |
320n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
1.0 |
59.60 |
8 |
7.9 |
122.40 |
16 |
15.8 |
151.56 |
32 |
31.6 |
193.95 |
64 |
63.0 |
235.30 |
1600n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
1.0 |
64.21 |
64 |
63.4 |
277.25 |
128 |
126.2 |
343.67 |
256 |
250.4 |
503.87 |
384 |
371.6 |
640.38 |
512 |
490.6 |
805.25 |
n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
129.3 |
428.72 |
32 |
1403.7 |
1190.53 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
10 |
9.9 |
11.3 |
12 |
40 |
1 |
8 |
12.6 |
12 |
13.4 |
17 |
31 |
8 |
16 |
17 |
15 |
22 |
25 |
40 |
15.98 |
32 |
23 |
23 |
31 |
33 |
50 |
31.94 |
48 |
29 |
28 |
40 |
41 |
70 |
47.9 |
64 |
33.6 |
38 |
45 |
47 |
70 |
63.9 |
128 |
49 |
47 |
64 |
67 |
150 |
127.6 |
256 |
84 |
75 |
107 |
126 |
391 |
255 |
800n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14 |
11 |
20 |
40 |
80 |
1 |
64 |
39 |
40 |
55 |
80 |
110 |
63.9 |
128 |
58 |
50 |
75 |
150 |
202 |
127.6 |
256 |
90 |
80 |
115 |
240 |
380 |
255 |
384 |
120 |
107 |
155 |
316 |
530 |
381.4 |
512 |
149 |
130 |
196 |
400 |
700 |
508 |
768 |
258 |
200 |
630 |
680 |
1280 |
756 |
1024 |
420 |
263 |
1280 |
1350 |
1900 |
992 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
32 |
467 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
90 |
False |
32 |
370 |
noneSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
110 |
False |
32 |
1255 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.79 |
15.812 |
22.171 |
22.66 |
24.527 |
0.99925 |
8 |
19.619 |
18.702 |
20.283 |
21.283 |
49.858 |
7.9866 |
16 |
24.347 |
22.816 |
24.601 |
30.805 |
83.174 |
15.958 |
32 |
32.883 |
30.65 |
40.314 |
40.992 |
129.39 |
31.856 |
48 |
43.084 |
44.219 |
50.994 |
56.952 |
210.66 |
47.689 |
64 |
53.643 |
53.416 |
61.031 |
97.948 |
264.43 |
63.476 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
20.624 |
19.301 |
26.012 |
28.626 |
50.408 |
0.99918 |
64 |
73.596 |
71.06 |
84.034 |
234.8 |
251.47 |
63.497 |
128 |
123.56 |
110.62 |
139.56 |
300.04 |
449.06 |
126.28 |
256 |
188.12 |
162.21 |
200.33 |
538 |
814.35 |
249.61 |
384 |
268.43 |
198.76 |
527.84 |
786.32 |
1372.3 |
369.42 |
512 |
405.24 |
287.28 |
1347.4 |
1439.1 |
2252.5 |
486.61 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
318.26 |
False |
32 |
2085 |
True |
32 |
125 |
320n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
25.742 |
25.178 |
29.273 |
30.679 |
40.96 |
0.99891 |
8 |
37.458 |
36.717 |
43.875 |
45.592 |
57.38 |
7.9865 |
16 |
46.788 |
45.738 |
51.555 |
60.965 |
75.74 |
15.963 |
32 |
64.08 |
57.471 |
84.993 |
89.653 |
128.29 |
31.873 |
48 |
85.545 |
96.194 |
111.54 |
117.86 |
176.5 |
47.714 |
64 |
93.02 |
104.95 |
116 |
124.89 |
195.03 |
63.61 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
21.451 |
20.791 |
23.836 |
24.61 |
53.358 |
0.99922 |
64 |
91.55 |
103.46 |
124.84 |
126.52 |
134.28 |
63.575 |
128 |
177.23 |
190.8 |
213.12 |
218.6 |
244.28 |
127.01 |
256 |
279.71 |
279.51 |
358.52 |
371.5 |
449.47 |
252.36 |
384 |
386.16 |
389.76 |
521.21 |
556.27 |
722.57 |
375.28 |
512 |
492.63 |
496.77 |
691.83 |
793.61 |
1101.9 |
494.73 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
338.78 |
False |
32 |
3041.9 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
41.4 |
41.0 |
43.1 |
43.9 |
65.2 |
1.0 |
8 |
69.5 |
68.9 |
73.8 |
76.8 |
116.3 |
8.0 |
16 |
84.7 |
80.4 |
108.1 |
113.5 |
149.9 |
15.9 |
32 |
138.2 |
147.3 |
172.9 |
180.1 |
232.4 |
31.7 |
48 |
2610.6 |
2456.5 |
4743.8 |
4941.9 |
6120.2 |
41.5 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
32.0 |
32.0 |
33.2 |
34.9 |
35.6 |
1.0 |
64 |
263.5 |
310.4 |
324.9 |
326.4 |
342.2 |
63.4 |
128 |
562.0 |
591.7 |
646.0 |
829.8 |
835.5 |
124.8 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
151.5 |
32 |
599.4 |
160n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
51.9 |
51.2 |
53.3 |
54.1 |
91.2 |
1.0 |
True |
1 |
57.0 |
51.5 |
74.3 |
75.6 |
134.2 |
1.0 |
False |
8 |
78.6 |
77.6 |
82.8 |
84.0 |
144.3 |
8.0 |
True |
8 |
92.7 |
80.3 |
127.8 |
131.8 |
246.6 |
7.9 |
False |
16 |
85.0 |
83.8 |
86.4 |
87.3 |
165.2 |
15.9 |
True |
16 |
107.9 |
85.5 |
161.3 |
164.5 |
350.9 |
15.8 |
False |
32 |
147.0 |
149.3 |
176.1 |
184.5 |
295.1 |
31.7 |
True |
32 |
273.1 |
241.1 |
415.1 |
505.6 |
817.4 |
31.2 |
960n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
74.0 |
73.7 |
75.3 |
77.5 |
94.4 |
1.0 |
True |
1 |
108.0 |
96.6 |
100.2 |
111.7 |
473.1 |
1.0 |
False |
64 |
366.4 |
427.7 |
438.1 |
447.2 |
456.1 |
63.1 |
True |
64 |
541.1 |
604.0 |
658.0 |
803.4 |
833.1 |
62.4 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
18.7 |
False |
32 |
252.4 |
True |
32 |
148.7 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
30.948 |
31.181 |
34.543 |
36.056 |
47.975 |
0.99827 |
8 |
47.991 |
48.392 |
53.894 |
55.871 |
87.543 |
7.978 |
16 |
61.284 |
61.356 |
68.72 |
76.258 |
118.18 |
15.923 |
32 |
75.633 |
74.065 |
95.78 |
102.73 |
155.86 |
31.809 |
48 |
91.854 |
99.673 |
111.41 |
113.95 |
255.06 |
47.621 |
64 |
114.38 |
126.17 |
135.55 |
139.42 |
321.86 |
63.361 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
26.371 |
26.905 |
29.871 |
34.808 |
35.852 |
0.9987 |
64 |
116.95 |
132.38 |
156.75 |
165.34 |
170.12 |
63.681 |
128 |
227.25 |
232.36 |
279.57 |
295.28 |
372.04 |
126.56 |
256 |
351 |
363.49 |
448.89 |
506.45 |
769.04 |
249.55 |
384 |
451.4 |
451.33 |
622.64 |
676.79 |
935.07 |
372.85 |
512 |
579.83 |
578.14 |
838.03 |
1041.4 |
1447.4 |
489.7 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
213.96 |
32 |
1021 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
25.898 |
25.159 |
29.92 |
31.529 |
38.217 |
0.99898 |
8 |
39.652 |
38.118 |
48.426 |
56.406 |
75.964 |
7.986 |
16 |
51.214 |
48.284 |
58.617 |
78.85 |
103.47 |
15.963 |
32 |
62.971 |
65.896 |
79.542 |
102.56 |
123.49 |
31.882 |
48 |
82.938 |
89.665 |
102.51 |
151.8 |
176.93 |
47.767 |
64 |
107.35 |
113.37 |
128.69 |
197.71 |
246.97 |
63.598 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
23.143 |
22.251 |
26.383 |
26.517 |
62.112 |
0.9993 |
64 |
123.91 |
120.69 |
209.79 |
221.48 |
222.74 |
63.746 |
128 |
250.18 |
224.35 |
404.34 |
470.45 |
495.53 |
127 |
256 |
424.4 |
391.15 |
726.09 |
848.57 |
940.01 |
250.66 |
384 |
644.41 |
550.55 |
1274 |
1344.7 |
1493.9 |
369.39 |
512 |
882.82 |
803.01 |
1746.6 |
1874.4 |
1955.3 |
486.89 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
299.83 |
32 |
2790.2 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
24.0 |
22.5 |
27.8 |
29.0 |
38.9 |
1.0 |
8 |
30.5 |
29.0 |
30.3 |
50.9 |
70.2 |
8.0 |
16 |
37.8 |
35.0 |
38.0 |
54.7 |
104.2 |
15.9 |
32 |
48.1 |
51.4 |
61.6 |
71.8 |
141.3 |
31.8 |
48 |
63.8 |
69.2 |
77.3 |
104.0 |
205.1 |
47.6 |
64 |
85.9 |
85.0 |
100.8 |
147.2 |
313.0 |
63.4 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
26.0 |
24.0 |
30.5 |
34.0 |
74.3 |
1.0 |
64 |
103.5 |
101.6 |
125.2 |
269.1 |
296.7 |
63.5 |
128 |
179.6 |
175.2 |
196.0 |
383.6 |
513.2 |
126.0 |
256 |
306.6 |
308.1 |
367.3 |
724.0 |
988.7 |
248.3 |
384 |
535.5 |
393.4 |
1469.1 |
1642.0 |
2496.4 |
365.0 |
512 |
1126.3 |
551.7 |
3230.1 |
3967.6 |
4614.8 |
476.8 |
512 |
1134.3 |
571.6 |
3422.9 |
3841.8 |
4632.6 |
476.7 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
211.3 |
32 |
1395.8 |
160n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
29.5 |
29.1 |
30.2 |
30.5 |
56.1 |
1.0 |
True |
1 |
36.6 |
30.6 |
53.6 |
54.4 |
109.9 |
1.0 |
False |
8 |
39.5 |
38.3 |
40.8 |
42.2 |
96.1 |
8.0 |
True |
8 |
52.3 |
39.3 |
92.5 |
94.6 |
180.2 |
8.0 |
False |
16 |
51.8 |
40.7 |
72.4 |
74.5 |
118.5 |
15.9 |
True |
16 |
67.3 |
47.9 |
114.3 |
116.3 |
301.0 |
15.9 |
False |
32 |
64.0 |
49.5 |
84.4 |
86.1 |
161.2 |
31.8 |
True |
32 |
105.6 |
90.6 |
208.2 |
212.1 |
487.5 |
31.5 |
960n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
66.1 |
65.6 |
66.9 |
72.3 |
73.0 |
1.0 |
True |
1 |
92.3 |
91.5 |
92.7 |
104.1 |
104.9 |
1.0 |
False |
64 |
207.2 |
227.4 |
242.7 |
387.8 |
401.4 |
63.2 |
True |
64 |
363.7 |
397.5 |
435.2 |
653.1 |
670.7 |
62.8 |
False |
128 |
294.3 |
299.9 |
312.1 |
525.3 |
658.5 |
125.5 |
False |
256 |
518.9 |
504.9 |
724.3 |
1018.2 |
1668.4 |
245.6 |
False |
384 |
867.2 |
683.3 |
2002.2 |
2262.0 |
3026.8 |
359.8 |
False |
512 |
2194.6 |
2014.4 |
4142.6 |
4819.2 |
5894.0 |
443.7 |
False |
512 |
2176.1 |
1993.4 |
4113.3 |
4797.2 |
5879.7 |
443.8 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
19.2 |
True |
1 |
17.7 |
False |
32 |
341.0 |
True |
32 |
178.0 |
# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
128.5 |
433.0 |
32 |
1326.0 |
1268.77 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
13 |
11.8 |
12.8 |
14 |
40 |
1 |
8 |
17.6 |
16.8 |
18.5 |
22 |
39 |
8 |
16 |
22.5 |
21.3 |
25 |
31 |
60.3 |
15.98 |
32 |
32.4 |
35 |
42 |
46 |
70 |
31.93 |
48 |
41 |
40 |
58 |
59 |
100 |
47.9 |
64 |
46 |
50 |
64 |
66 |
100 |
63.8 |
128 |
73 |
66 |
94 |
97 |
220 |
127.5 |
800n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
16 |
13 |
20 |
40 |
80 |
1 |
64 |
60 |
60 |
80 |
110 |
180 |
63.8 |
128 |
90 |
80 |
110 |
230 |
300 |
127.5 |
256 |
133.3 |
120 |
174 |
340 |
530 |
254 |
384 |
183 |
166 |
245 |
430 |
800 |
380 |
512 |
260 |
223 |
510 |
600 |
1200 |
505 |
768 |
535 |
354 |
1500 |
1640 |
2150 |
739 |
1024 |
940 |
600 |
2300 |
2570 |
2930 |
960 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
32 |
460 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
60 |
False |
32 |
234 |
noneSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
108.04 |
False |
32 |
827.71 |
320n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
1.0 |
99.63 |
8 |
7.9 |
138.54 |
16 |
15.7 |
203.51 |
32 |
31.4 |
303.27 |
48 |
39.8 |
2991.17 |
64 |
50.8 |
3737.57 |
1600n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
1.0 |
102.40 |
64 |
62.9 |
490.66 |
128 |
124.5 |
682.94 |
256 |
244.3 |
1008.00 |
384 |
313.3 |
3766.07 |
512 |
318.4 |
9788.07 |
n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
77.1 |
712.47 |
32 |
838.0 |
2027.50 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
11.518 |
10.501 |
11.753 |
12.329 |
29.332 |
0.99948 |
8 |
13.042 |
12.727 |
14.303 |
16.54 |
27.45 |
7.9934 |
16 |
17.579 |
16.357 |
25.071 |
26.493 |
42.529 |
15.974 |
32 |
21.415 |
18.903 |
27.705 |
28.62 |
65.338 |
31.924 |
48 |
32.285 |
32.166 |
34.611 |
35.804 |
102.55 |
47.839 |
64 |
33.933 |
36.076 |
39.682 |
41.26 |
120.46 |
63.75 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14.345 |
12.899 |
18.496 |
21.621 |
49.489 |
0.99941 |
64 |
43.724 |
41.908 |
48.17 |
138.95 |
140.61 |
63.715 |
128 |
76.158 |
69.027 |
80.239 |
198.79 |
277.37 |
126.88 |
256 |
113.72 |
89.307 |
128.73 |
294.93 |
488.96 |
251.9 |
384 |
150.8 |
133.5 |
170.69 |
465.93 |
722.34 |
374.76 |
512 |
198.83 |
173.53 |
280.75 |
577.5 |
975.18 |
495.82 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
365.61 |
False |
32 |
3638 |
True |
32 |
101.5 |
320n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
13.377 |
12.014 |
14.8 |
15.721 |
17.616 |
0.99944 |
8 |
20.362 |
19.784 |
23.525 |
26.147 |
33.041 |
7.9919 |
16 |
28.97 |
28.08 |
34.588 |
37.939 |
52.757 |
15.97 |
32 |
42.96 |
38.11 |
55.592 |
57.928 |
94.558 |
31.904 |
48 |
58.84 |
67.281 |
75.958 |
77.311 |
136.62 |
47.794 |
64 |
79.065 |
88.762 |
99.511 |
109.32 |
181.85 |
63.634 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.358 |
16.19 |
19.135 |
23.812 |
52.552 |
0.99925 |
64 |
86.236 |
102.67 |
110.26 |
112.32 |
119.81 |
63.754 |
128 |
204.03 |
205.92 |
220.27 |
223.14 |
250.96 |
126.93 |
256 |
315.08 |
321.68 |
395.18 |
408.04 |
502.56 |
251.93 |
384 |
423.9 |
421.51 |
577.25 |
664 |
857.63 |
373.82 |
512 |
573 |
563.93 |
874.58 |
1039.3 |
1263.3 |
492.08 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
335.32 |
False |
32 |
2876.4 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
21.6 |
21.1 |
22.8 |
23.4 |
32.5 |
1.0 |
8 |
31.5 |
31.2 |
33.8 |
35.2 |
51.5 |
8.0 |
16 |
45.5 |
45.4 |
47.8 |
53.5 |
79.2 |
16.0 |
32 |
67.4 |
61.8 |
89.0 |
90.6 |
119.4 |
31.8 |
48 |
98.9 |
116.6 |
127.1 |
134.2 |
182.2 |
47.6 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
21.1 |
20.3 |
23.0 |
24.7 |
49.2 |
1.0 |
64 |
161.7 |
197.5 |
204.8 |
208.3 |
212.7 |
63.6 |
128 |
369.0 |
396.0 |
432.2 |
450.6 |
455.8 |
126.2 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
264.5 |
32 |
882.6 |
160n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
51.9 |
51.2 |
53.3 |
54.1 |
91.2 |
1.0 |
True |
1 |
57.0 |
51.5 |
74.3 |
75.6 |
134.2 |
1.0 |
False |
8 |
78.6 |
77.6 |
82.8 |
84.0 |
144.3 |
8.0 |
True |
8 |
92.7 |
80.3 |
127.8 |
131.8 |
246.6 |
7.9 |
False |
16 |
85.0 |
83.8 |
86.4 |
87.3 |
165.2 |
15.9 |
True |
16 |
107.9 |
85.5 |
161.3 |
164.5 |
350.9 |
15.8 |
False |
32 |
147.0 |
149.3 |
176.1 |
184.5 |
295.1 |
31.7 |
True |
32 |
273.1 |
241.1 |
415.1 |
505.6 |
817.4 |
31.2 |
960n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
59.8 |
59.2 |
61.8 |
63.2 |
92.3 |
1.0 |
True |
1 |
85.1 |
72.2 |
76.2 |
83.5 |
514.8 |
1.0 |
False |
64 |
255.4 |
304.4 |
310.3 |
313.2 |
315.0 |
63.4 |
True |
64 |
372.3 |
422.1 |
463.4 |
469.2 |
471.6 |
63.1 |
False |
128 |
478.7 |
513.0 |
528.6 |
666.1 |
686.7 |
125.3 |
True |
128 |
687.7 |
695.9 |
776.2 |
1113.9 |
1620.6 |
123.7 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
20.7 |
False |
32 |
336.4 |
True |
32 |
260.5 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.671 |
20.104 |
21.685 |
21.96 |
38.539 |
0.99884 |
8 |
31.194 |
31.719 |
35.482 |
36.195 |
66.154 |
7.9835 |
16 |
45.007 |
46.339 |
50.019 |
51.456 |
92.321 |
15.953 |
32 |
61.018 |
56.473 |
77.184 |
79.764 |
136.4 |
31.801 |
48 |
79.726 |
87.697 |
98.868 |
100.64 |
172.36 |
47.647 |
64 |
102.58 |
117.05 |
125.58 |
130.69 |
271.66 |
63.453 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
24.776 |
24.027 |
30.257 |
32.001 |
63.361 |
0.99901 |
64 |
114.75 |
133.68 |
149.83 |
153.73 |
157.22 |
63.679 |
128 |
235.08 |
244.42 |
285.45 |
290.93 |
374.07 |
126.5 |
256 |
367.18 |
365.85 |
468.17 |
506.79 |
691.56 |
250.49 |
384 |
485.58 |
465.62 |
668.72 |
772.08 |
999.89 |
371.5 |
512 |
637.89 |
635.49 |
970.62 |
1132.4 |
1399.5 |
489.51 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
180.37 |
32 |
1037.8 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14.202 |
13.052 |
17.88 |
19.466 |
25.956 |
0.99947 |
8 |
24.437 |
22.858 |
29.938 |
38.885 |
49.618 |
7.9913 |
16 |
36.536 |
34.467 |
44.059 |
54.708 |
81.838 |
15.973 |
32 |
48.218 |
52.974 |
64.388 |
81.382 |
99.757 |
31.904 |
48 |
69.337 |
78.074 |
89.509 |
110.77 |
143.67 |
47.784 |
64 |
92.235 |
101.77 |
114.86 |
181.18 |
204.97 |
63.62 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.475 |
17.506 |
22.395 |
23.533 |
78.759 |
0.99942 |
64 |
120.23 |
118.25 |
207.22 |
215.26 |
240.07 |
63.754 |
128 |
246.07 |
227.94 |
373.71 |
448.51 |
456.11 |
126.96 |
256 |
426 |
422.2 |
724.17 |
801.8 |
895.13 |
251.23 |
384 |
572.76 |
545.31 |
976.51 |
1096.7 |
1214.7 |
372.22 |
512 |
781.89 |
704.9 |
1342 |
1422.6 |
1613.6 |
491.52 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
349.89 |
32 |
2885.7 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
15.3 |
14.0 |
15.3 |
16.8 |
34.7 |
1.0 |
8 |
25.1 |
21.8 |
34.5 |
35.2 |
44.5 |
8.0 |
16 |
20.2 |
19.0 |
22.5 |
39.6 |
46.3 |
16.0 |
32 |
30.6 |
24.2 |
39.2 |
43.4 |
75.1 |
31.9 |
48 |
38.1 |
40.8 |
45.1 |
54.8 |
94.4 |
47.8 |
64 |
57.1 |
55.5 |
59.0 |
60.5 |
166.6 |
63.6 |
960n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
18.0 |
16.2 |
21.2 |
24.7 |
69.8 |
1.0 |
64 |
62.4 |
63.0 |
71.4 |
156.6 |
158.2 |
63.7 |
128 |
109.1 |
105.9 |
117.6 |
229.6 |
306.7 |
126.8 |
256 |
171.7 |
147.5 |
202.2 |
405.7 |
578.8 |
251.3 |
384 |
227.2 |
198.5 |
287.6 |
570.6 |
826.4 |
373.8 |
512 |
319.9 |
269.6 |
632.4 |
829.8 |
1471.6 |
492.6 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
1 |
293.5 |
32 |
2602.0 |
160n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
21.8 |
21.2 |
22.1 |
22.4 |
39.6 |
1.0 |
True |
1 |
25.6 |
22.5 |
35.0 |
35.5 |
71.3 |
1.0 |
False |
8 |
27.5 |
26.9 |
28.2 |
29.7 |
61.9 |
8.0 |
True |
8 |
34.7 |
28.2 |
48.4 |
50.4 |
120.4 |
8.0 |
False |
16 |
36.9 |
35.1 |
36.6 |
57.6 |
97.4 |
15.9 |
True |
16 |
55.2 |
56.0 |
82.7 |
84.6 |
193.7 |
15.9 |
False |
32 |
51.6 |
39.9 |
65.7 |
68.2 |
131.3 |
31.8 |
True |
32 |
71.6 |
64.6 |
146.0 |
150.3 |
303.5 |
31.7 |
False |
48 |
68.0 |
76.7 |
85.8 |
92.2 |
168.5 |
47.7 |
True |
48 |
101.9 |
83.0 |
178.4 |
189.3 |
479.8 |
47.3 |
960n-gramSpeaker Diarization |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
False |
1 |
55.8 |
55.5 |
56.3 |
61.4 |
61.8 |
1.0 |
True |
1 |
70.0 |
69.2 |
70.6 |
79.2 |
80.0 |
1.0 |
False |
64 |
175.2 |
194.2 |
197.8 |
273.0 |
312.9 |
63.5 |
True |
64 |
263.9 |
292.5 |
313.6 |
458.7 |
465.9 |
63.1 |
False |
128 |
252.2 |
262.2 |
273.2 |
397.9 |
471.7 |
126.1 |
False |
256 |
438.2 |
419.6 |
492.1 |
819.3 |
1027.5 |
248.0 |
False |
384 |
759.5 |
626.9 |
1600.5 |
1968.1 |
2799.9 |
364.6 |
False |
512 |
2054.4 |
1823.7 |
3943.4 |
4720.0 |
5667.5 |
456.1 |
False |
512 |
2015.4 |
1795.2 |
3924.8 |
4561.5 |
5507.2 |
457.1 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
21.9 |
True |
1 |
19.9 |
False |
32 |
420.4 |
True |
32 |
308.5 |
Speaker Diarization |
# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|---|
False |
1 |
158.3 |
352.24 |
False |
32 |
1631.3 |
1018.84 |
160n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
11 |
10.3 |
11.2 |
12.4 |
30 |
1 |
8 |
20 |
19 |
26 |
30 |
42 |
7.99 |
16 |
28 |
26 |
35 |
40 |
56 |
15.97 |
32 |
35 |
35 |
48 |
52 |
73 |
31.9 |
64 |
50 |
55 |
66 |
70 |
100 |
63.8 |
800n-gram# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14 |
11.5 |
20 |
30 |
60 |
1 |
64 |
70 |
70 |
90 |
100 |
170 |
63.8 |
128 |
88 |
84 |
110 |
190 |
250 |
127.4 |
256 |
128 |
117 |
164 |
300 |
460 |
254.4 |
n-gram# of streams |
Throughput (RTFX) |
|---|---|
32 |
440 |
n-gramSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
70 |
False |
32 |
193.5 |
noneSpeaker Diarization |
# of streams |
Throughput (RTFX) |
|---|---|---|
False |
1 |
6.2 |
False |
32 |
43.3 |
320n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
1.0 |
86.53 |
8 |
7.9 |
136.34 |
16 |
15.8 |
163.55 |
32 |
31.4 |
253.70 |
48 |
44.8 |
991.17 |
64 |
58.9 |
1180.73 |
1600n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
1.0 |
87.29 |
64 |
63.0 |
433.03 |
128 |
125.0 |
586.62 |
256 |
246.3 |
836.96 |
384 |
337.1 |
2274.47 |
512 |
342.7 |
7912.27 |
n-gram# of streams |
Throughput (RTFX) |
Average Latency (ms) |
|---|---|---|
1 |
85.2 |
642.87 |
32 |
1056.6 |
1606.57 |
On-Prem Hardware Specifications#
GPU |
|
|---|---|
NVIDIA DGX A100 40GB |
|
CPU |
|
Model |
AMD EPYC 7742 64-Core Processor |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
64 |
NUMA node(s) |
8 |
Frequency boost |
enabled |
CPU max MHz |
2250 |
CPU min MHz |
1500 |
RAM |
|
Model |
Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz |
Configured Memory Speed |
2933 MT/s |
RAM Size |
32x64GB (2048GB Total) |
GPU |
|
|---|---|
NVIDIA H100 80GB HBM3 |
|
CPU |
|
Model |
Intel(R) Xeon(R) Platinum 8480CL |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
56 |
NUMA node(s) |
2 |
CPU max MHz |
3800 |
CPU min MHz |
800 |
RAM |
|
Model |
Micron DDR5 MTC40F2046S1RC48BA1 4800MHz |
Configured Memory Speed |
4400 MT/s |
RAM Size |
32x64GB (2048GB Total) |
GPU |
|
|---|---|
NVIDIA L40 |
|
CPU |
|
Model |
AMD EPYC 7763 64-Core Processor |
Thread(s) per core |
1 |
Socket(s) |
2 |
Core(s) per socket |
64 |
NUMA node(s) |
8 |
Frequency boost |
enabled |
CPU max MHz |
3529 |
CPU min MHz |
1500 |
RAM |
|
Model |
Samsung DDR4 M393A4K40DB3-CWE 3200MHz |
Configured Memory Speed |
3200 MT/s |
RAM Size |
16x32GB (512GB Total) |
Model Accuracy#
ASR models are evaluated with the following metrics. Lower values indicate better accuracy, with 0% representing perfect transcription.
Word Error Rate (WER): Used for word-based languages (English, Spanish, and French). Measures the minimum word substitutions, insertions, and deletions needed to match the reference transcript, divided by total reference words.
Character Error Rate (CER): Used for character-based languages (Chinese, Japanese, and Mandarin). Measures the minimum character edits needed, divided by total reference characters.
Concatenated minimum-Permutation Word Error Rate (cpWER): Used for speaker diarization. Calculated by:
Concatenating all utterances per speaker for both reference and hypothesis.
Computing WER across all possible speaker permutations of the hypothesis.
Selecting the lowest WER (best permutation).
Model Name |
Language |
Dataset |
Best latency WER (%) ⬇️ |
Best throughput WER (%) ⬇️ |
Offline WER (%) ⬇️ |
|---|---|---|---|---|---|
Parakeet 1.1b CTC |
en-US |
10.45 |
8.80 |
7.96 |
|
en-US |
6.34 |
4.74 |
4.09 |
||
en-US |
46.09 |
41.35 |
39.61 |
||
en-US (Silero VAD) |
5.57 |
4.8 |
4.5 |
||
en-US (Telephony) |
7.33 |
5.11 |
4.17 |
||
en-US (Telephony) |
30.13 |
27.82 |
28.91 |
||
en-US (Telephony) + Sortformer Diarizer |
28.43 (cpWER) |
- |
- |
||
Parakeet 0.6b TDT |
en-US |
- |
- |
11.46 |
|
en-US |
- |
- |
11.65 |
||
en-US |
- |
- |
9.15 |
||
en-US |
- |
- |
2.01 |
||
en-US |
- |
- |
3.51 |
||
en-US |
- |
- |
2.16 |
||
en-US |
- |
- |
3.38 |
||
en-US |
- |
- |
6.6 |
||
Parakeet 1.1b RNNT |
en-US |
10.74 |
10.54 |
9.77 |
|
es-US |
7.19 |
5.26 |
3.83 |
||
es-ES |
16.15 |
14.42 |
11.51 |
||
fr-FR |
11.41 |
9.10 |
6.36 |
||
de-DE |
11.29 |
9.16 |
7.09 |
||
ru-RU |
21.44 |
19.23 |
17.39 |
||
Parakeet 0.6b CTC |
en-US |
10.57 |
8.87 |
8.45 |
|
Parakeet 0.6b CTC |
vi-VN |
10 |
8.58 |
7.97 |
|
Parakeet 0.6b CTC |
zh-CN |
5.81 |
5.84 |
6.09 |
|
Parakeet 0.6b CTC |
es-US |
9.14 |
6.15 |
5.34 |
|
Canary 1b |
en-US |
Not supported |
Not supported |
6.78 |
|
es-US |
Not supported |
Not supported |
3.54 |
||
de-DE |
Not supported |
Not supported |
5.18 |
||
fr-FR |
Not supported |
Not supported |
4.21 |
||
ru-RU |
Not supported |
Not supported |
10.33 |
||
es-ES |
Not supported |
Not supported |
14.40 |
||
pt-BR |
Not supported |
Not supported |
5.83 |
||
Conformer 120m CTC |
es-US |
6.75 |
6.26 |
5.66 |