Performance
Contents
Performance¶
Evaluation Process¶
This section shows the latency and throughput numbers for streaming and offline configurations of the Riva ASR service on different GPUs. These numbers were captured after the pre-configured ASR pipelines from our Quick Start scripts were deployed. The Jasper, QuartzNet, Conformer and Citrinet-1024 acoustic models were tested.
In streaming mode, the client and the server used audio chunks of the same duration (100ms, 160ms, 800ms depending on
the server configuration). The Riva streaming client riva_streaming_asr_client
,
provided in the Riva client image, was used with the --simulate_realtime
flag to
simulate transcription from a microphone, where each stream was doing 3 iterations
over a sample audio file from the Librispeech dataset (1272-135031-0000.wav).
The command used to measure performance was:
riva_streaming_asr_client \
--chunk_duration_ms=<chunk_duration> \
--simulate_realtime=true \
--automatic_punctuation=true \
--num_parallel_requests=<num_streams> \
--word_time_offsets=true \
--print_transcripts=false \
--interim_results=false \
--num_iterations=<3*num_streams> \
--audio_file=1272-135031-0000.wav \
--output_filename=/tmp/output.json
The riva_streaming_asr_client
returns the following latency measurements:
intermediate latency
: latency of responses returned withis_final == false
final latency
: latency of responses returned withis_final == true
latency
: the overall latency of all returned responses. This is what is tabulated in the tables below.
In offline mode, the command used to measure maximum throughput was:
riva_asr_client \
--automatic_punctuation=true \
--num_parallel_requests=32 \
--word_time_offsets=true \
--print_transcripts=false \
--num_iterations=96 \
--audio_file=5x_1272-135031-0000.wav \
--output_filename=/tmp/output.json
Results¶
Latencies and throughput measurements for streaming and offline configurations are reported in the following tables. Throughput is measured in RTFX (duration of audio generated / computation time).
Note
If the language model is none
, the inference is performed with a greedy decoder. If the language model is n-gram
,
then a beam decoder was used.
For specifications of the hardware on which these measurements were collected, refer to the Hardware Specifications section.
160
229
273
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
10.68 |
10.34 |
10.77 |
12.4 |
17.4 |
0.999430 |
n-gram |
8 |
14.70 |
14.28 |
14.96 |
16.4 |
28.8 |
7.9921 |
n-gram |
16 |
27.1 |
25.5 |
30.3 |
32.1 |
59.3 |
15.9657 |
n-gram |
32 |
42.0 |
41.44 |
44.0 |
45.5 |
87.1 |
31.9023 |
n-gram |
48 |
50.8 |
50.6 |
55.5 |
57.6 |
108 |
47.814 |
n-gram |
64 |
57.3 |
56.5 |
64.7 |
67.7 |
120 |
63.725 |
none |
1 |
9.97 |
9.74 |
9.97 |
11.6 |
14.70 |
0.999513 |
none |
8 |
14.43 |
14.1 |
14.93 |
15.34 |
27.1 |
7.99203 |
none |
16 |
27.0 |
26.8 |
28.3 |
29.7 |
55.3 |
15.967 |
none |
32 |
37.4 |
36.9 |
38.7 |
40.2 |
76.7 |
31.914 |
none |
48 |
45.5 |
45.0 |
51.6 |
53.51 |
94.0 |
47.843 |
none |
64 |
48.7 |
49.7 |
58.7 |
61.1 |
103.1 |
63.766 |
800
1118
1245
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
11.54 |
11.01 |
14.1 |
15.1 |
15.22 |
1.000 |
n-gram |
64 |
62.1 |
63.6 |
70.9 |
82 |
97.8 |
63.791 |
n-gram |
128 |
93.0 |
94.0 |
113 |
117.7 |
140 |
127.35 |
n-gram |
256 |
153 |
160 |
185 |
187.1 |
254 |
253.76 |
n-gram |
384 |
205.5 |
229 |
255.3 |
280 |
346 |
379.48 |
n-gram |
512 |
269 |
299 |
339 |
365 |
469 |
503.94 |
n-gram |
768 |
375 |
421 |
477 |
547 |
675 |
750.1 |
none |
1 |
10.38 |
10.02 |
11.1 |
12.76 |
13.04 |
0.999560 |
none |
64 |
55.0 |
58.4 |
67 |
72.0 |
85.2 |
63.812 |
none |
128 |
83.1 |
90 |
105.5 |
116 |
142 |
127.40 |
none |
256 |
128 |
143.8 |
164 |
180 |
220 |
254.07 |
none |
384 |
181.3 |
210 |
226.8 |
243 |
306.1 |
379.91 |
none |
512 |
233 |
272 |
297 |
315 |
406 |
504.91 |
none |
768 |
337 |
397 |
435 |
464.4 |
595 |
752.50 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
4270 |
none |
32 |
5420 |
160
n-gram
261
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
10.31 |
10.20 |
10.59 |
11.4 |
13.6 |
0.999623 |
8 |
14.39 |
14.255 |
15.03 |
15.8 |
18.8 |
7.99577 |
16 |
26.1 |
27.3 |
28.50 |
28.81 |
32.4 |
15.9837 |
32 |
39.874 |
39.6 |
43.0 |
43.8 |
46.8 |
31.9520 |
48 |
48.4 |
48.7 |
53.6 |
55.0 |
60 |
47.901 |
64 |
50.1 |
51.2 |
59.6 |
61.4 |
65.1 |
63.8603 |
800
n-gram
1132
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
10.718 |
10.79 |
11.7 |
12.4 |
12.57 |
0.99963000000000002 |
64 |
65.6 |
66.5 |
73.2 |
75 |
76.44 |
63.824 |
128 |
97.6 |
98.4 |
113.7 |
117.0 |
123.2 |
127.50 |
256 |
154 |
162.6 |
190 |
198 |
212 |
254.30 |
384 |
211 |
233.9 |
260.8 |
268.4 |
285 |
380.36 |
512 |
272 |
306.1 |
344 |
346.4 |
371 |
505.80 |
768 |
408 |
452 |
517 |
545 |
668 |
752.0 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
1410 |
160
n-gram
198
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
10.40 |
10.26 |
10.61 |
10.9 |
11.63 |
0.999590 |
8 |
17.08 |
16.82 |
17.84 |
18.8 |
19.30 |
7.99450 |
16 |
30.6 |
31.3 |
34.20 |
34.47 |
37.1 |
15.9787 |
32 |
48.1 |
49.4 |
51.2 |
52.5 |
55.6 |
31.933 |
48 |
55.8 |
56.6 |
64.4 |
66.1 |
70.8 |
47.881 |
64 |
60.0 |
62.1 |
71.8 |
73.8 |
79 |
63.812 |
800
n-gram
920
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
10.5930 |
10.56 |
11.4 |
12.0 |
13.3 |
0.999593 |
64 |
71 |
73 |
88.3 |
90 |
95.00 |
63.787 |
128 |
110 |
111 |
124.2 |
128.6 |
192 |
127.24 |
256 |
188 |
200.0 |
295 |
312 |
340 |
252.68 |
384 |
268 |
289 |
440 |
493 |
518 |
376.2 |
512 |
353 |
372.5 |
570 |
696 |
699 |
497.6 |
768 |
552 |
546 |
1003 |
1110 |
1515 |
736.2 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
1190 |
160
n-gram
215
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
11.67 |
11.21 |
12.31 |
13.2 |
34.9 |
0.998857 |
8 |
16.8 |
16.0 |
18.2 |
19.3 |
55.4 |
7.982 |
16 |
28.7 |
28.94 |
31.7 |
33.4 |
97.1 |
15.946 |
32 |
44.1 |
42.5 |
46.2 |
47.8 |
136.1 |
31.843 |
48 |
56.6 |
54.0 |
60.12 |
63.2 |
195.9 |
47.66 |
64 |
59.1 |
58.0 |
67.3 |
70.1 |
200 |
63.44 |
800
n-gram
780
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
16.34 |
15.5 |
18.74 |
32.2 |
36.9 |
0.99885 |
64 |
89.4 |
81.8 |
107.2 |
206 |
228 |
63.53 |
128 |
139.8 |
124.0 |
191 |
287 |
349 |
126.38 |
256 |
224 |
180 |
352 |
560 |
930 |
248.4 |
384 |
295 |
249.5 |
481 |
780 |
1190 |
369.2 |
512 |
409 |
329 |
774 |
1098 |
1596 |
486.0 |
768 |
685 |
526 |
1422 |
2019 |
2460 |
712 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
630 |
160
n-gram
140
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
13.00 |
12.16 |
15.3 |
24.20 |
29.21 |
0.999553 |
8 |
24.98 |
23.56 |
28.8 |
36.8 |
61 |
7.99263 |
16 |
40.3 |
36.1 |
48.9 |
56.4 |
71.5 |
15.97 |
32 |
70 |
70.3 |
76.8 |
87.5 |
103 |
31.9103 |
48 |
85.4 |
84.1 |
96.4 |
107 |
129.8 |
47.840 |
64 |
100 |
99 |
113.4 |
124.1 |
161 |
63.760 |
800
n-gram
566
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.82 |
17.18 |
30.8 |
31.8 |
32.71 |
0.999550 |
64 |
127 |
123 |
177 |
183 |
192 |
63.70 |
128 |
195 |
191 |
254 |
272 |
299 |
127.05 |
256 |
331.1 |
334.9 |
428 |
462 |
502 |
252.36 |
384 |
486 |
489 |
692 |
749 |
930 |
373.3 |
512 |
668 |
650 |
1040 |
1093 |
1324 |
490.9 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
520 |
160
83
89
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
18.43 |
18.07 |
18.7 |
19.9 |
30.65 |
0.999023 |
n-gram |
8 |
34.27 |
33.664 |
34.51 |
35.234 |
66.8 |
7.9810 |
n-gram |
16 |
55.8 |
50.2 |
59.7 |
64.9 |
468 |
15.88 |
n-gram |
32 |
94 |
81.3 |
84.97 |
142.9 |
691 |
31.7 |
n-gram |
48 |
122.0 |
115.2 |
119.5 |
211.6 |
666 |
47.664 |
n-gram |
64 |
141.7 |
144.0 |
149.8 |
193.930 |
302.7 |
63.4310 |
none |
1 |
17.470 |
17.13 |
17.54 |
18.88 |
27.81 |
0.999107 |
none |
8 |
32.71 |
32.11 |
32.72 |
33.6 |
61.95 |
7.98 |
none |
16 |
53.3 |
46.5 |
50.2 |
59.9 |
272 |
15.88 |
none |
32 |
86.5 |
75.5 |
77.8 |
157.5 |
700 |
31.7 |
none |
48 |
109.5 |
106 |
109.8 |
170.4 |
587 |
47.695 |
none |
64 |
126.6 |
133.07 |
135.5 |
159.0 |
276 |
63.4810 |
800
397
429
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
18.80 |
18.49 |
19.5 |
24.5 |
25.4 |
0.999233 |
n-gram |
64 |
164.7 |
149.54 |
175.7 |
195.4 |
1190 |
63.607 |
n-gram |
128 |
232.4 |
248 |
262 |
274 |
307 |
126.733 |
n-gram |
256 |
389 |
398 |
502 |
500 |
579 |
251.25 |
n-gram |
384 |
564.8 |
592.3 |
815 |
949 |
1122 |
370.2 |
none |
1 |
16.81 |
16.47 |
16.97 |
19.170 |
19.99 |
0.999380 |
none |
64 |
172.2 |
133.6 |
168.9 |
398 |
1173 |
63.64 |
none |
128 |
204.1 |
213 |
240.3 |
259 |
281 |
126.853 |
none |
256 |
344 |
331 |
454.5 |
456 |
529 |
251.67 |
none |
384 |
479.6 |
517 |
670 |
785.0 |
1016 |
372.2 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
811 |
none |
32 |
855.8 |
160
n-gram
72
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.81 |
17.86 |
18.29 |
18.43 |
20.3 |
0.999313 |
8 |
36.5 |
36.0 |
37.3 |
38.4 |
43.2 |
7.98877 |
16 |
47.50 |
46.8 |
54.3 |
57.3 |
61.9 |
15.965 |
32 |
75.5 |
85.2 |
88.23 |
90.3 |
91.1 |
31.8917 |
48 |
110.2 |
120.6 |
125.9 |
128.9 |
138.4 |
47.726 |
64 |
242.8 |
177 |
604 |
740 |
840 |
63.42 |
800
n-gram
327
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
18.27 |
18.43 |
19.7 |
19.81 |
20.3 |
0.999317 |
64 |
259 |
156 |
609 |
1260 |
1228 |
61.6 |
128 |
298 |
285.2 |
460 |
464 |
487 |
125.663 |
256 |
526 |
551 |
810 |
912 |
1212 |
245.9 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
580 |
100
394
274
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
6.53 |
6.368 |
6.54 |
7.0 |
9.62 |
0.999677 |
none |
8 |
8.31 |
8.04 |
8.42 |
8.87 |
14.6 |
7.99540 |
none |
16 |
12.5 |
13.1 |
14.2 |
15.1 |
23.2 |
15.9859999999999989 |
none |
32 |
16.0 |
16.37 |
18.4 |
19.6 |
31.9 |
31.9637 |
none |
48 |
20.5 |
20.7 |
24.4 |
25.4 |
41.7 |
47.9247 |
none |
64 |
25.3 |
26.6 |
30.4 |
31.1 |
50.3 |
63.880 |
n-gram |
1 |
7.17 |
6.801 |
8.02 |
9.15 |
13.0 |
0.999490 |
n-gram |
8 |
9.55 |
9.07 |
10.72 |
12.3 |
22.7 |
7.9932 |
n-gram |
16 |
15.02 |
14.4 |
17.2 |
19.4 |
32.5 |
15.9797 |
n-gram |
32 |
18.4 |
17.8 |
21.4 |
23.1 |
42.0 |
31.946 |
n-gram |
48 |
24.0 |
23.3 |
27.8 |
30.4 |
57 |
47.894 |
n-gram |
64 |
28.8 |
29.0 |
34.1 |
37.5 |
63.1 |
63.826 |
800
2460
1553
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
7.99 |
7.47 |
10.34 |
10.9 |
11.4 |
0.999637 |
none |
64 |
28.5 |
26.5 |
43.5 |
50.2 |
54.5 |
63.869 |
none |
128 |
41.44 |
40.9 |
66.5 |
80.0 |
90.6 |
127.589999999999989 |
none |
256 |
62 |
57.3 |
107.7 |
138 |
158 |
254.56 |
none |
384 |
74.1 |
72 |
127.9 |
164 |
196.8 |
381.31 |
none |
512 |
98 |
93.7 |
158 |
209 |
245 |
507.50 |
none |
768 |
123.3 |
116.7 |
231.04 |
302 |
355 |
757.70 |
n-gram |
1 |
12.6 |
12.36 |
16.15 |
16.7 |
16.75 |
0.99946 |
n-gram |
64 |
45.9 |
45.1 |
66 |
70.4 |
76 |
63.847 |
n-gram |
128 |
64.4 |
61.4 |
94 |
104 |
115 |
127.53 |
n-gram |
256 |
93 |
90 |
142.9 |
161.6 |
188 |
254.39 |
n-gram |
384 |
124 |
121 |
183 |
202 |
228 |
380.7 |
n-gram |
512 |
147 |
142 |
229 |
246.4 |
295 |
506.6 |
n-gram |
768 |
197 |
195 |
308 |
352 |
404 |
756.77 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
none |
32 |
7270 |
n-gram |
32 |
1380 |
100
n-gram
183
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
13.17 |
12.856 |
14.05 |
15.2 |
19.25 |
0.999 |
8 |
14.89 |
14.18 |
16.2 |
17.9 |
26.1 |
7.9918 |
16 |
24.37 |
24.11 |
29.1 |
30.8 |
43.1 |
15.9763 |
32 |
38.4 |
38.1 |
41.7 |
43.7 |
60 |
31.927 |
48 |
39.8 |
39.6 |
47.1 |
51.5 |
65.5 |
47.888 |
64 |
45.1 |
46.6 |
54.8 |
58.3 |
74.8 |
63.824 |
800
n-gram
1155
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.34 |
16.68 |
22.0 |
23.2 |
27.0 |
0.999383 |
64 |
69.0 |
67.9 |
88.0 |
98.7 |
105.2 |
63.787 |
128 |
95.3 |
97.6 |
127.7 |
136.7 |
152.1 |
127.393 |
256 |
139 |
145 |
189 |
208 |
233 |
254.13 |
384 |
190.7 |
201.8 |
257.9 |
283.1 |
319 |
380.12 |
512 |
235.9 |
251.9 |
322 |
361 |
407 |
505.10 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
1350 |
160
157
157
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
15.53 |
15.26 |
15.53 |
15.674 |
27.8 |
0.999093 |
n-gram |
8 |
24.6 |
23.73 |
25.1 |
26.2 |
49 |
7.9845 |
n-gram |
16 |
43.3 |
43.4 |
46.3 |
47.3 |
84.8 |
15.9540 |
n-gram |
32 |
59.43 |
59.12 |
65.7 |
68.2 |
112 |
31.860 |
n-gram |
48 |
63.5 |
63.2 |
68.1 |
70.4 |
133.8 |
47.770 |
n-gram |
64 |
76.7 |
76.6 |
83.6 |
86.8 |
167.2 |
63.620 |
none |
1 |
14.2 |
14.0 |
14.2 |
14.3 |
23.2 |
0.99922 |
none |
8 |
24.1 |
23.69 |
24.57 |
25.1 |
43.5 |
7.9866 |
none |
16 |
44.8 |
44.9 |
46.8 |
48.4 |
82 |
15.953 |
none |
32 |
61.2 |
61.03 |
67.7 |
69.1 |
106.5 |
31.873 |
none |
48 |
68.8 |
69.9 |
74.7 |
77.0 |
128 |
47.785 |
none |
64 |
81.7 |
84.0 |
93.1 |
95.6 |
166 |
63.634 |
800
715
724
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
17.22 |
16.71 |
17.3 |
24.13 |
24.49 |
0.999193 |
n-gram |
64 |
93 |
89.1 |
117 |
124 |
155 |
63.701 |
n-gram |
128 |
130 |
137 |
153.17 |
167 |
198.0 |
127.147 |
n-gram |
256 |
228 |
248 |
272 |
298 |
358.8 |
252.89 |
n-gram |
384 |
324.1 |
360.7 |
392 |
430 |
520.5 |
377.17 |
n-gram |
512 |
421 |
470 |
528 |
634 |
729 |
499.93 |
none |
1 |
15.53 |
15.07 |
15.69 |
20.72 |
21.28 |
0.999283 |
none |
64 |
89 |
91 |
116 |
120 |
136 |
63.70 |
none |
128 |
132 |
141 |
169 |
178 |
214 |
127.09 |
none |
256 |
224.5 |
249.7 |
283 |
302 |
368 |
252.79 |
none |
384 |
328 |
379 |
430 |
447 |
535.8 |
376.9 |
none |
512 |
450 |
502 |
571 |
700 |
802 |
498.92 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
2940 |
none |
32 |
3420 |
160
n-gram
143
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
15.46 |
15.56 |
16.1 |
16.3 |
19.4 |
0.999450 |
8 |
26.2 |
26.49 |
28.35 |
28.7 |
35.3 |
7.9911 |
16 |
48.1 |
48.7 |
51.2 |
51.62 |
56.3 |
15.9717 |
32 |
64.6 |
64.6 |
70.42 |
71.7 |
75.3 |
31.9233 |
48 |
71.46 |
72.23 |
78.1 |
80.0 |
89 |
47.868 |
64 |
86.8 |
87.7 |
96.3 |
99.3 |
105.2 |
63.792 |
800
n-gram
635
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
16.73 |
17.04 |
18.1 |
19.18 |
19.20 |
0.999 |
64 |
89 |
88 |
117 |
129 |
144 |
63.75 |
128 |
145 |
154 |
177 |
186 |
201 |
127.21 |
256 |
245 |
266.8 |
298 |
306 |
328 |
253.21 |
384 |
372 |
405 |
455 |
469 |
523 |
377.6 |
512 |
503 |
551 |
680 |
701 |
995 |
499.0 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
1080 |
160
n-gram
133
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14.8 |
14.8 |
15.2 |
15.4 |
17.378 |
0.999407 |
8 |
26.4 |
27.0 |
27.9 |
29.6 |
35.6 |
7.9912 |
16 |
47.6 |
48.1 |
49.7 |
50.2 |
53.9 |
15.9693 |
32 |
61.59 |
61.5 |
67.81 |
69.3 |
71.6 |
31.922 |
48 |
72.6 |
73.0 |
76.9 |
78.3 |
82.5 |
47.853 |
64 |
85.2 |
87.9 |
95.0 |
97.1 |
105 |
63.756 |
800
n-gram
589
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
15.43 |
15.8 |
16.39 |
17.9 |
17.67 |
0.999400 |
64 |
98.5 |
98 |
126 |
133 |
143.6 |
63.68 |
128 |
162.0 |
165.2 |
229 |
238 |
269 |
126.78 |
256 |
290.0 |
294 |
481 |
511 |
555 |
250.64 |
384 |
421 |
425 |
730 |
835 |
956.8 |
371.7 |
512 |
618 |
567 |
1114 |
1402 |
1622 |
490.10 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
936 |
160
n-gram
139
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
17.4 |
17.0 |
18.7 |
19.3 |
43.7 |
0.99855 |
8 |
27.9 |
26.7 |
31.9 |
34.2 |
56.2 |
7.979 |
16 |
47.6 |
47.1 |
52.4 |
53.9 |
126 |
15.930 |
32 |
67.7 |
65.5 |
73.5 |
76.6 |
188 |
31.783 |
48 |
74.1 |
69.7 |
78.9 |
83.9 |
263 |
47.571 |
64 |
82.9 |
82.2 |
93.7 |
99.1 |
242 |
63.43 |
800
n-gram
533
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
25.3 |
25.6 |
29.9 |
34.6 |
42.1 |
0.99874 |
64 |
139 |
125 |
159.6 |
226.0 |
306 |
63.15 |
128 |
185 |
165 |
235.7 |
459 |
573 |
125.1 |
256 |
308 |
276.5 |
434 |
710 |
950 |
247.7 |
384 |
452 |
398.7 |
810 |
976 |
1450 |
366.7 |
512 |
600 |
524 |
1230 |
1324 |
1940 |
484.9 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
613 |
160
n-gram
88
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.8 |
17.38 |
24.7 |
32.47 |
40.2 |
0.999317 |
8 |
41.0 |
39.3 |
52 |
58.8 |
82 |
7.9871 |
16 |
62.3 |
61.1 |
71.6 |
81.1 |
99.7 |
15.9620 |
32 |
81.1 |
79.1 |
91.5 |
105 |
123 |
31.899 |
48 |
98 |
99 |
115 |
128 |
168.7 |
47.792 |
64 |
122.3 |
120.1 |
151 |
163 |
194.6 |
63.65 |
800
n-gram
346
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
27.9 |
29.1 |
40.8 |
41.8 |
42.0 |
0.999303 |
64 |
145 |
144 |
189 |
195 |
206.2 |
63.64 |
128 |
241.4 |
241.0 |
305 |
328 |
346 |
126.83 |
256 |
448 |
439 |
630 |
709 |
860 |
249.47 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
473 |
160
44
48
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
22.86 |
22.25 |
25.27 |
26.3 |
39.866 |
0.99873 |
n-gram |
8 |
59.0 |
61.0 |
65.3 |
66.6 |
110.4 |
7.93 |
n-gram |
16 |
86.6 |
74 |
88 |
95.4 |
659 |
15.76 |
n-gram |
32 |
187.7 |
137.5 |
277 |
791 |
1154 |
31.5 |
none |
1 |
21.4 |
20.57 |
24.5 |
25.6 |
34.31 |
0.99886 |
none |
8 |
52.1 |
45.6 |
61 |
63.3 |
104.6 |
7.94 |
none |
16 |
66.5 |
64.7 |
75.5 |
77.6 |
129.7 |
15.84 |
none |
32 |
187.9 |
123.6 |
395 |
814 |
1140 |
31.3 |
800
209
232
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
24.9 |
24.3 |
28.2 |
33.0 |
35.2 |
0.99890 |
n-gram |
64 |
247.6 |
260.1 |
272 |
279 |
323 |
62.7 |
n-gram |
128 |
444 |
471 |
497.2 |
503.3 |
555 |
125.76 |
none |
1 |
25.1 |
25.3 |
26.9 |
27.5 |
31.1 |
0.99909 |
none |
64 |
223.5 |
230.2 |
239.4 |
255 |
267 |
62.9 |
none |
128 |
384 |
354 |
439 |
470 |
482 |
124.6 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
453 |
none |
32 |
470 |
160
n-gram
34
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
27.050 |
27.10 |
27.72 |
27.92 |
31.6 |
0.998950 |
8 |
52.3 |
52.7 |
61.00 |
63.0 |
68.4 |
7.9817 |
16 |
79.1 |
84 |
94 |
96 |
101 |
15.9440 |
32 |
980 |
970 |
1241 |
1340 |
1440 |
31.1 |
800
n-gram
156
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
28.35 |
28.50 |
30.9 |
32.07 |
32.1 |
0.998970 |
64 |
485 |
318 |
1243 |
1780 |
2156 |
62.76 |
128 |
616 |
579 |
900 |
1030 |
1280.3 |
122.977 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
426 |
100
255
210
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
9.50 |
9.36 |
9.49 |
9.60 |
14.47 |
0.999510 |
none |
8 |
13.8 |
13.3 |
14.1 |
16.11 |
30.3 |
7.9909 |
none |
16 |
23.4 |
23.4 |
25.33 |
25.91 |
39.5 |
15.9763 |
none |
32 |
30.0 |
30.6 |
34.3 |
35.4 |
55 |
31.939 |
none |
48 |
36.2 |
36.9 |
44.5 |
45.9 |
71.1 |
47.875 |
none |
64 |
41.6 |
42.6 |
52.6 |
55.3 |
81.7 |
63.805 |
n-gram |
1 |
10.71 |
10.2 |
12.2 |
13.48 |
22.4 |
0.99913 |
n-gram |
8 |
16.36 |
15.15 |
20.08 |
23.9 |
40.2 |
7.9877 |
n-gram |
16 |
25.0 |
24.4 |
28.7 |
31.5 |
54.1 |
15.968 |
n-gram |
32 |
32.7 |
32.1 |
38.5 |
42.1 |
72.9 |
31.906 |
n-gram |
48 |
41.8 |
41.4 |
50.9 |
55.5 |
92.1 |
47.824 |
n-gram |
64 |
43.5 |
43.1 |
55.5 |
61.1 |
99 |
63.758 |
800
1590
1169
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
11.55 |
10.72 |
16.15 |
16.47 |
16.8 |
0.999440 |
none |
64 |
52.7 |
50.7 |
77.0 |
89.1 |
96.9 |
63.792 |
none |
128 |
68.3 |
66.2 |
110 |
143 |
161 |
127.300 |
none |
256 |
89.5 |
85.1 |
142 |
187 |
214.6 |
254.10 |
none |
384 |
113.7 |
111 |
196 |
273 |
302.2 |
379.75 |
none |
512 |
139 |
129 |
246 |
350 |
385.8 |
504.94 |
none |
768 |
198.2 |
190 |
358 |
515 |
583 |
752.3 |
n-gram |
1 |
19.4 |
19.85 |
25.6 |
26.5 |
28.3 |
0.99927 |
n-gram |
64 |
70 |
65.1 |
97 |
105 |
112.5 |
63.760 |
n-gram |
128 |
93.2 |
92.9 |
135.9 |
152.4 |
168 |
127.277 |
n-gram |
256 |
123 |
111.2 |
184 |
206 |
264 |
253.99 |
n-gram |
384 |
159 |
156 |
238 |
270 |
305 |
379.83 |
n-gram |
512 |
193 |
187.7 |
296.5 |
346 |
392 |
504.88 |
n-gram |
768 |
269 |
263 |
421 |
519 |
594 |
751.8 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
none |
32 |
4490 |
n-gram |
32 |
1240 |
100
n-gram
101
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.80 |
19.3 |
21.1 |
23.2 |
31.2 |
0.99902 |
8 |
29.5 |
25.4 |
40.6 |
42.0 |
54 |
7.985 |
16 |
41.8 |
41.9 |
47.2 |
50.3 |
66 |
15.961 |
32 |
49.8 |
49.8 |
57.6 |
61.3 |
78.6 |
31.914 |
48 |
62.6 |
63.0 |
72.9 |
77.0 |
99.0 |
47.829 |
64 |
75.5 |
78.0 |
89.5 |
94.0 |
129.5 |
63.725 |
800
n-gram
713
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
27.5 |
25.8 |
34.4 |
36.16 |
40.0 |
0.99895 |
64 |
111 |
109 |
146 |
158 |
183 |
63.685 |
128 |
151 |
154 |
182.6 |
201 |
218 |
127.14 |
256 |
242.7 |
254.9 |
307 |
332 |
361 |
252.96 |
384 |
333 |
354 |
431 |
469 |
507.6 |
377.61 |
512 |
416 |
446 |
556 |
611 |
656 |
500.8 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
1120 |
160
143
155
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
13.47 |
13.233 |
13.46 |
13.61 |
23.29 |
0.999217 |
n-gram |
8 |
20.58 |
20.6 |
21.10 |
21.30 |
39.8 |
7.9864 |
n-gram |
16 |
36.0 |
36.0 |
38.6 |
40.3 |
78 |
15.9557 |
n-gram |
32 |
53.3 |
52.09 |
57.1 |
59.7 |
112.1 |
31.869 |
n-gram |
48 |
66.7 |
66.5 |
72.6 |
74.1 |
148 |
47.745 |
n-gram |
64 |
75.9 |
74.2 |
88.4 |
90.1 |
184 |
63.580 |
none |
1 |
12.5 |
12.4 |
12.7 |
12.9 |
18.54 |
0.999313 |
none |
8 |
19.5 |
19.14 |
19.7 |
19.9 |
39.4 |
7.98830 |
none |
16 |
34.3 |
34.6 |
37.9 |
38.3 |
62.2 |
15.960 |
none |
32 |
50.8 |
50.2 |
53.1 |
53.7 |
109.4 |
31.882 |
none |
48 |
60.8 |
59.8 |
70.2 |
71.9 |
137.4 |
47.771 |
none |
64 |
72.1 |
67.5 |
86.6 |
87.93 |
171 |
63.616 |
800
648
706
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
14.17 |
13.74 |
14.27 |
19.33 |
19.72 |
0.999357 |
n-gram |
64 |
85 |
85.2 |
94 |
117.1 |
134.4 |
63.709 |
n-gram |
128 |
146.87 |
147.5 |
158.9 |
187.1 |
227 |
127.030000000000015 |
n-gram |
256 |
255.0 |
270.0 |
292.3 |
324 |
408.7 |
252.48 |
n-gram |
384 |
360 |
393.5 |
433.9 |
476.8 |
601 |
376.27 |
n-gram |
512 |
489 |
516.2 |
636 |
810 |
1090 |
497.2 |
none |
1 |
12.96 |
12.57 |
13.0 |
17.20 |
17.45 |
0.999417 |
none |
64 |
73.18 |
70.5 |
86.6 |
106.8 |
122.4 |
63.739 |
none |
128 |
123.0 |
135.5 |
146.8 |
175.1 |
211 |
127.093 |
none |
256 |
219 |
249.6 |
273 |
312.1 |
391 |
252.66 |
none |
384 |
329 |
373 |
406 |
450.7 |
575.9 |
376.62 |
none |
512 |
452 |
495 |
580 |
768 |
938 |
498.1 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
2460 |
none |
32 |
2830 |
160
n-gram
150
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
12.91 |
13.00 |
13.28 |
13.35 |
14.8 |
0.999520 |
8 |
20.7 |
20.0 |
21.1 |
20.89 |
24.63 |
7.99387 |
16 |
35.4 |
36.1 |
38.4 |
39.1 |
40.54 |
15.9783 |
32 |
50.2 |
50.3 |
53.2 |
54.3 |
56.1 |
31.9403 |
48 |
64.5 |
65.3 |
70.0 |
71.63 |
74.4 |
47.879 |
64 |
74.4 |
75.5 |
85.5 |
86.9 |
90.2 |
63.802 |
800
n-gram
665
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14.39 |
14.53 |
15.3 |
15.80 |
15.88 |
0.999497 |
64 |
81 |
83 |
91.7 |
94.1 |
97.9 |
63.798 |
128 |
140.2 |
143.70 |
154.1 |
158.6 |
168 |
127.290 |
256 |
240.5 |
257.4 |
281.9 |
288 |
307 |
253.42 |
384 |
338.9 |
380 |
417 |
429 |
452 |
378.21 |
512 |
466 |
508 |
600 |
646 |
919 |
500.3 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
895 |
160
n-gram
126
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
13.97 |
13.96 |
14.29 |
14.40 |
15.23 |
0.999417 |
8 |
24.6 |
24.19 |
25.6 |
25.26 |
27.4 |
7.99203 |
16 |
38.5 |
42.6 |
48.8 |
49.3 |
50.8 |
15.9713 |
32 |
63 |
62.7 |
62.9 |
68 |
67.8 |
31.915 |
48 |
80.1 |
82.4 |
93.6 |
95.54 |
98.6 |
47.830 |
64 |
94.8 |
95.2 |
114.8 |
116.9 |
126 |
63.684 |
800
n-gram
542
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
13.856 |
14.00 |
14.27 |
14.94 |
15.10 |
0.999443 |
64 |
102.2 |
107.9 |
121.0 |
129 |
139 |
63.685 |
128 |
188.8 |
185.7 |
267 |
275 |
295.5 |
126.61 |
256 |
331 |
334.5 |
550 |
600 |
590 |
250.0 |
384 |
498 |
494 |
740 |
1000 |
1360 |
370.39 |
512 |
865 |
664 |
1760 |
2160 |
2875 |
487.3 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
644 |
160
n-gram
126
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
14.76 |
14.39 |
15.39 |
15.95 |
38.7 |
0.99871 |
8 |
23.8 |
22.40 |
25.9 |
26.9 |
61.7 |
7.9715 |
16 |
40.66 |
39.4 |
43.4 |
45.0 |
142 |
15.919 |
32 |
59.2 |
57.1 |
62.3 |
65.8 |
188.2 |
31.778 |
48 |
76.8 |
73.4 |
81.7 |
91.5 |
273 |
47.53 |
64 |
90.4 |
86.7 |
95.9 |
104.8 |
318 |
63.28 |
800
n-gram
434
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
20.73 |
19.9 |
23.1 |
37.1 |
40.3 |
0.99876 |
64 |
121.8 |
109.2 |
137.3 |
203.6 |
328 |
63.33 |
128 |
205 |
177 |
281 |
475 |
710 |
124.8 |
256 |
373 |
306 |
559 |
1030 |
1379 |
245.3 |
384 |
580 |
452 |
1210 |
1440 |
2140 |
361.7 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
446 |
160
n-gram
79
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
16.0 |
15.1 |
18.0 |
27.5 |
32.4 |
0.999407 |
8 |
31.7 |
30.08 |
36.0 |
44.6 |
68 |
7.99033 |
16 |
57.3 |
58.3 |
63.7 |
73 |
95.1 |
15.96 |
32 |
89.5 |
90.2 |
99.5 |
109 |
134 |
31.885 |
48 |
124.5 |
126.0 |
138.1 |
150.1 |
184 |
47.752 |
64 |
166 |
164.4 |
197 |
220 |
299 |
63.536 |
800
n-gram
367
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
22.9 |
20.1 |
33.1 |
34.16 |
34.7 |
0.999403 |
64 |
178 |
168 |
236 |
250 |
270 |
63.601 |
128 |
291.8 |
284 |
388 |
411 |
438 |
126.46 |
256 |
580 |
554 |
800 |
950 |
1150 |
247.0 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
389 |
160
60
63
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
24.62 |
20.72 |
25.35 |
26.3 |
43.66 |
0.99866 |
n-gram |
8 |
38.5 |
36.4 |
48.3 |
50.3 |
76.3 |
7.9800 |
n-gram |
16 |
50.8 |
45.0 |
61.2 |
62.5 |
108.1 |
15.75 |
n-gram |
32 |
118 |
93.8 |
100.3 |
263 |
841.4 |
31.4 |
n-gram |
48 |
154.6 |
136.7 |
212 |
365 |
697.1 |
47.5723 |
none |
1 |
20.82 |
20.45 |
20.93 |
21.7 |
34.7 |
0.998870 |
none |
8 |
106.1 |
32.6 |
43.6 |
838 |
1630 |
7.98307 |
none |
16 |
54.8 |
48.1 |
59.7 |
64 |
434 |
15.88 |
none |
32 |
117.2 |
88.2 |
176.8 |
442 |
815 |
31.804 |
none |
48 |
148 |
131.3 |
183.8 |
306 |
678 |
47.579 |
800
289
300
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
25.7 |
22.02 |
27.2 |
31.3 |
32.9 |
0.99900 |
n-gram |
64 |
170.5 |
183.4 |
192.1 |
226.9 |
244.8 |
63.504 |
n-gram |
128 |
316 |
323 |
364 |
366 |
435 |
126.23 |
n-gram |
256 |
900 |
680 |
1790 |
2040 |
2370 |
244 |
none |
1 |
21.11 |
20.73 |
21.90 |
24.82 |
27.4 |
0.999173 |
none |
64 |
164.3 |
180 |
182.9 |
216.2 |
231.1 |
63.525 |
none |
128 |
281.42 |
275 |
328.72 |
364 |
410.3 |
126.333 |
none |
256 |
504 |
550 |
648 |
871 |
1553 |
243 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
750 |
none |
32 |
778.62 |
160
n-gram
37
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
21.5 |
18.98 |
22.648 |
23.72 |
25.7 |
0.99917 |
8 |
70.1 |
49.1 |
52 |
55.9 |
747 |
7.982 |
16 |
70.3 |
68 |
85.0 |
85.8 |
87.5 |
15.85 |
32 |
199.9 |
149.5 |
529 |
726 |
922 |
31.4 |
800
n-gram
143
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
19.53 |
19.69 |
20.9 |
21.39 |
21.46 |
0.999247 |
64 |
287 |
289 |
471 |
477 |
533 |
62.857 |
128 |
568 |
549.4 |
836.0 |
985 |
1190 |
120 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
409 |
100
278
218
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
8.28 |
8.16 |
8.34 |
8.40 |
12.4 |
0.999563 |
none |
8 |
10.68 |
10.25 |
10.74 |
10.91 |
24.0 |
7.9923 |
none |
16 |
15.6 |
14.4 |
18.8 |
19.5 |
33.7 |
15.98 |
none |
32 |
19.9 |
20.6 |
23.6 |
24.6 |
44.2 |
31.9483 |
none |
48 |
26.5 |
27.84 |
30.9 |
32.39 |
59.7 |
47.8957 |
none |
64 |
31.6 |
32.8 |
37.7 |
39.1 |
71.4 |
63.8310 |
n-gram |
1 |
8.98 |
8.60 |
9.87 |
10.61 |
17.9 |
0.999327 |
n-gram |
8 |
11.82 |
11.06 |
12.77 |
14.20 |
28.8 |
7.9905 |
n-gram |
16 |
17.21 |
15.95 |
20.1 |
22.2 |
42.1 |
15.97 |
n-gram |
32 |
23.3 |
24.3 |
27.8 |
30.4 |
58.2 |
31.926 |
n-gram |
48 |
28.4 |
29.1 |
33.2 |
36.0 |
73.9 |
47.863 |
n-gram |
64 |
34.9 |
35.1 |
41.8 |
46.7 |
89.7 |
63.762 |
800
1657
1115
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
8.87 |
8.23 |
12.4 |
12.60 |
13.2 |
0.99956000000000012 |
none |
64 |
37.0 |
35.3 |
62.6 |
73.8 |
81.3 |
63.825 |
none |
128 |
51.9 |
49.3 |
83 |
106 |
123 |
127.447 |
none |
256 |
78.3 |
72.1 |
137.0 |
182.0 |
216.3 |
254.040 |
none |
384 |
103.0 |
100 |
188.9 |
257.3 |
309.8 |
379.79 |
none |
512 |
130 |
125 |
245 |
335 |
406 |
504.61 |
none |
768 |
177.9 |
162 |
352 |
493.3 |
598 |
751.69 |
n-gram |
1 |
14.52 |
13.92 |
20.04 |
20.40 |
21.3 |
0.999383 |
n-gram |
64 |
55.3 |
53.3 |
85 |
94.9 |
105 |
63.794 |
n-gram |
128 |
79.8 |
74.5 |
123.4 |
138 |
155 |
127.35 |
n-gram |
256 |
120.5 |
111.0 |
191 |
215.0 |
245 |
253.86 |
n-gram |
384 |
158.1 |
146.8 |
254 |
291 |
336.9 |
379.37 |
n-gram |
512 |
194.5 |
183.4 |
314 |
368.9 |
433 |
504.15 |
n-gram |
768 |
261 |
243 |
450 |
538 |
636 |
750.97 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
none |
32 |
4680 |
n-gram |
32 |
988 |
100
n-gram
75
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
22.1 |
21.95 |
23.01 |
24.2 |
30.46 |
0.999023 |
8 |
24.9 |
24.1 |
26.7 |
27.7 |
42.9 |
7.9880 |
16 |
42.8 |
42.44 |
46.2 |
49.0 |
65 |
15.9630 |
32 |
66.5 |
65.5 |
79.2 |
82.4 |
99 |
31.887 |
48 |
79.0 |
83.0 |
94.5 |
100 |
176 |
47.797 |
64 |
159 |
147 |
216 |
229 |
409 |
63.650 |
800
n-gram
550
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
27.17 |
26.52 |
33.17 |
35.48 |
36.0 |
0.999040 |
64 |
117.3 |
117.0 |
147.5 |
158.3 |
178 |
63.681 |
128 |
184.5 |
192.2 |
235 |
252 |
269.8 |
126.953 |
256 |
327.3 |
349.8 |
421 |
443.9 |
480 |
252.09 |
384 |
464 |
497 |
605 |
659 |
710 |
375.25 |
512 |
682 |
660 |
970 |
1490 |
2260 |
494.9 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
883 |
160
62
65
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
24.10 |
23.64 |
23.92 |
24.09 |
45.9 |
0.99854 |
n-gram |
8 |
49.8 |
48.7 |
51.4 |
52.0 |
115.2 |
7.9688 |
n-gram |
16 |
61.6 |
60.6 |
67.1 |
68.7 |
119 |
15.934 |
n-gram |
32 |
92.6 |
89.6 |
94.2 |
96.2 |
211 |
31.778 |
n-gram |
48 |
131 |
126 |
139 |
184 |
332 |
47.519 |
none |
1 |
22.08 |
21.79 |
21.91 |
21.96 |
40.79 |
0.998693 |
none |
8 |
46.8 |
46.2 |
46.89 |
47.34 |
102 |
7.97220 |
none |
16 |
57.59 |
56.72 |
62.7 |
66 |
107.7 |
15.9373 |
none |
32 |
87.3 |
84.6 |
87.2 |
89.1 |
192 |
31.795 |
none |
48 |
114 |
100.8 |
132 |
140.6 |
309 |
47.549 |
none |
64 |
341 |
298 |
595 |
671 |
787 |
63.03 |
800
293
307
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
27.1 |
26.4 |
27.3 |
38.00 |
38.2 |
0.998777 |
n-gram |
64 |
156 |
161 |
177 |
214 |
248 |
63.505 |
n-gram |
128 |
291.7 |
316 |
338.1 |
379 |
449 |
126.133 |
n-gram |
256 |
624 |
618.2 |
810 |
1060 |
1730 |
247.67 |
none |
1 |
22.45 |
21.85 |
22.14 |
29.33 |
29.55 |
0.999040 |
none |
64 |
138 |
124 |
170 |
197.9 |
226 |
63.526 |
none |
128 |
250 |
291 |
315.2 |
361 |
432 |
126.193 |
none |
256 |
551 |
595.0 |
732 |
910 |
1554 |
248.02 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
1313 |
none |
32 |
1410 |
160
n-gram
63
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
23.12 |
23.3 |
23.97 |
24.07 |
28.76 |
0.999193 |
8 |
48.88 |
48.67 |
51.4 |
51.7 |
54.9 |
7.98647 |
16 |
60.2 |
59.6 |
65.8 |
67.6 |
74 |
15.9663 |
32 |
88.3 |
88.4 |
93.1 |
94.7 |
105.7 |
31.8963 |
48 |
128.9 |
130.8 |
138.2 |
140.3 |
151.9 |
47.747 |
800
n-gram
296
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
24.412 |
25.1 |
25.9 |
28.63 |
28.71 |
0.999 |
64 |
157.0 |
163.1 |
176 |
182 |
198 |
63.614 |
128 |
280 |
311.4 |
331.3 |
341 |
357 |
126.547 |
256 |
580 |
617 |
760 |
870 |
1230 |
248.70 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
700 |
160
n-gram
59
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
22.93 |
23.05 |
23.48 |
23.59 |
26.89 |
0.999103 |
8 |
50.75 |
50.47 |
52.64 |
53.04 |
55.91 |
7.98427 |
16 |
63.1 |
62.7 |
68.01 |
70.3 |
75.5 |
15.9597 |
32 |
101.8 |
101.9 |
106.2 |
108.0 |
117 |
31.8610 |
48 |
148 |
149.7 |
160.6 |
168 |
212.7 |
47.613 |
800
n-gram
262
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
24.1 |
24.33 |
25.1 |
28.0 |
27.40 |
0.999100 |
64 |
188 |
172 |
275.2 |
291 |
358 |
63.301 |
128 |
349 |
350.4 |
554 |
639 |
701 |
124.913 |
256 |
867 |
686 |
1630 |
1860 |
2146 |
243.5 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
664 |
160
n-gram
56
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
26.81 |
26.4 |
28.93 |
29.7 |
66.93 |
0.997803 |
8 |
55.4 |
53.4 |
61.3 |
63.2 |
105.3 |
7.9583 |
16 |
68.7 |
67.1 |
76.7 |
80.0 |
202 |
15.890 |
32 |
102.7 |
98.9 |
108.00 |
114.3 |
290 |
31.669 |
48 |
156 |
135.4 |
153.0 |
319 |
587 |
47.12 |
800
n-gram
232
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
39.0 |
40.4 |
44.08 |
63.8 |
64.9 |
0.99788 |
64 |
247 |
202 |
377 |
615 |
730 |
62.43 |
128 |
473 |
364 |
930 |
1250 |
1810 |
121.0 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
479 |
160
n-gram
49
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
29.88 |
28.56 |
37.5 |
45.0 |
61.5 |
0.99902000000000002 |
8 |
66.8 |
62.5 |
79.3 |
93.4 |
130 |
7.98103 |
16 |
82.8 |
78.4 |
99.1 |
119.0 |
154 |
15.9517 |
32 |
136.0 |
133.56 |
154.7 |
176 |
208.7 |
31.824 |
48 |
313 |
321 |
396 |
407 |
507 |
47.44 |
800
n-gram
211
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
42.9 |
42.8 |
62.2 |
63.3 |
65.9 |
0.99904 |
64 |
261 |
260 |
328 |
341 |
363 |
63.37 |
128 |
494 |
481 |
658 |
710 |
871 |
124.9 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
386 |
160
18
18
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
38.373 |
38.1 |
39.3565 |
39.83 |
70.8 |
0.99776 |
n-gram |
8 |
67.3 |
65.7 |
68.2 |
70.4 |
140.4 |
7.9623 |
n-gram |
16 |
113.6 |
101.4 |
126.3 |
147.6 |
298 |
15.0 |
none |
1 |
36.210 |
35.8 |
36.822 |
37.6 |
62.0 |
0.998050 |
none |
8 |
63.27 |
61.9 |
63.7 |
65.5 |
128.1 |
7.9663 |
none |
16 |
1610 |
1690 |
2296 |
2420 |
2600 |
15.4 |
800
87
90
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
39.0 |
39.2 |
42.3 |
44.2 |
51.4 |
0.99830 |
n-gram |
64 |
1395 |
521 |
2560 |
2680 |
2910 |
59.4 |
none |
1 |
36.1 |
35.5 |
36.7 |
42.0 |
44.29 |
0.99864 |
none |
64 |
597 |
485 |
1011 |
1737 |
2286 |
61.5 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
n-gram |
32 |
188 |
none |
32 |
190 |
160
n-gram
13
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
48.90 |
49.08 |
49.909 |
50.13 |
56.14 |
0.998160 |
8 |
95.0 |
95.2 |
97.7 |
98.5 |
102.9 |
7.9697 |
800
n-gram
63
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
50.16 |
50.6 |
53.4 |
56.6 |
56.7 |
0.998157 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
247.0 |
100
132
119
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
14.97 |
14.74 |
14.90 |
14.96 |
22.8 |
0.999243 |
none |
8 |
22.46 |
21.82 |
22.19 |
22.37 |
50.9 |
7.9867 |
none |
16 |
41.7 |
41.4 |
47.61 |
48.30 |
80.1 |
15.9550 |
none |
32 |
41.1 |
42.2 |
52.8 |
55.8 |
91.0 |
31.891 |
none |
48 |
46.6 |
47.9 |
57.6 |
62.0 |
102.0 |
47.834 |
none |
64 |
56.5 |
56.4 |
70.1 |
74.4 |
119 |
63.716 |
n-gram |
1 |
16.78 |
15.92 |
19.19 |
21.1 |
34.9 |
0.99867 |
n-gram |
8 |
23.31 |
21.86 |
26.2 |
29.1 |
62.2 |
7.9811 |
n-gram |
16 |
46.6 |
44.9 |
55.0 |
61.4 |
105 |
15.936 |
n-gram |
32 |
48.1 |
47.0 |
61.6 |
69 |
116 |
31.866 |
n-gram |
48 |
51.4 |
50.9 |
64.8 |
73.9 |
130 |
47.769 |
n-gram |
64 |
61.1 |
59.6 |
76.1 |
86.6 |
151 |
63.647 |
800
680
554
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
none |
1 |
25.38 |
23.958 |
32.21 |
32.60 |
35.0 |
0.998960 |
none |
64 |
113 |
103.5 |
144.5 |
178.4 |
215.9 |
63.61 |
none |
128 |
140 |
133.7 |
209 |
255 |
267 |
126.82 |
none |
256 |
237 |
222 |
344 |
429 |
488 |
251.78 |
none |
384 |
315 |
305.5 |
493 |
624 |
716 |
374.87 |
none |
512 |
412 |
387 |
649 |
822 |
1060 |
495.72 |
n-gram |
1 |
39.6 |
40.1 |
48.7 |
49.9 |
52.0 |
0.99845 |
n-gram |
64 |
152 |
141.6 |
201 |
213 |
221.5 |
63.50 |
n-gram |
128 |
190 |
185 |
244 |
281 |
307 |
126.71 |
n-gram |
256 |
310.2 |
300.8 |
395 |
450.5 |
509 |
251.60 |
n-gram |
384 |
408.9 |
403.1 |
550 |
643 |
740 |
374.53 |
n-gram |
512 |
562 |
534 |
772 |
895 |
1410 |
495.0 |
Language model |
# of streams |
Throughput (RTFX) |
---|---|---|
none |
32 |
1730 |
n-gram |
32 |
1041 |
100
n-gram
37
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
47.49 |
47.20 |
49.83 |
54.1 |
66.0 |
0.99802 |
8 |
47.0 |
47.5 |
52.9 |
56.5 |
80 |
7.9780 |
16 |
73.3 |
71.1 |
78.4 |
85.3 |
131 |
15.938 |
32 |
261 |
254 |
359 |
377 |
570 |
31.74 |
800
n-gram
270
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
||
1 |
69.27 |
69.870 |
78.8 |
84.2 |
86 |
0.997613 |
64 |
215.9 |
221.2 |
260 |
278.9 |
295 |
63.443 |
128 |
363 |
382 |
453 |
478 |
521 |
126.02 |
256 |
791 |
729 |
1128 |
1570 |
2377 |
247.01 |
n-gram
# of streams |
Throughput (RTFX) |
---|---|
32 |
707 |
160
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
23.96 |
24.47 |
25.14 |
25.95 |
42.33 |
0.99936 |
160
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
27.96 |
28.53 |
30.54 |
30.92 |
48.13 |
0.99916 |
160
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
25.28 |
24.21 |
29.97 |
30.21 |
40.50 |
0.99914 |
160
Language model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
n-gram |
1 |
28.84 |
27.91 |
31.32 |
31.95 |
47.64 |
0.99901 |
Hardware Specifications¶
GPU |
|
---|---|
NVIDIA DGX A100 40 GB |
|
CPU |
|
Model |
AMD EPYC 7742 64-Core Processor |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
64 |
NUMA node(s) |
8 |
Frequency boost |
enabled |
CPU max MHz |
2250 |
CPU min MHz |
1500 |
RAM |
|
Model |
Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz |
Configured Memory Speed |
2933 MT/s |
RAM Size |
32x64GB (2048GB Total) |
GPU |
|
---|---|
NVIDIA A30 |
|
CPU |
|
Model |
AMD EPYC 7742 64-Core Processor |
Thread(s) per core |
1 |
Socket(s) |
2 |
Core(s) per socket |
64 |
NUMA node(s) |
2 |
Frequency boost |
disabled |
CPU max MHz |
2250.0000 |
CPU min MHz |
1500.0000 |
RAM |
|
Model |
Samsung DDR4 M393A4K40DB3-CWE 3200MHz |
Configured Memory Speed |
3200 MT/s |
RAM Size |
32x64GB (2048GB Total) |
GPU |
|
---|---|
NVIDIA V100 SXM2 16 GB |
|
CPU |
|
Model |
Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
20 |
NUMA node(s) |
2 |
CPU max MHz |
3600 |
CPU min MHz |
1200 |
RAM |
|
Model |
Micron DDR4 36ASF4G72PZ-2G6D1 2667MHz |
Configured Memory Speed |
2133 MT/s |
RAM Size |
16x32GB (512GB Total) |
GPU |
|
---|---|
NVIDIA T4 |
|
CPU |
|
Model |
Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
18 |
NUMA node(s) |
2 |
CPU max MHz |
3900 |
CPU min MHz |
1000 |
RAM |
|
Model |
Samsung DDR4 M393A2K43BB1-CTD 2666MHz |
Configured Memory Speed |
2666 MT/s |
RAM Size |
24x16GB (384GB Total) |