Performance¶
Below are measured performance for the Jarvis ASR, NLP, and TTS services on NVIDIA T4, V100 SXM2 16GB, and NVIDIA A100 SXM4 40GB GPUs. CPU specifications for each system can be found here:
ASR¶
The latency numbers below were measured using the streaming recognition mode, with the BERT-based punctuation model enabled, a 4-gram language model, a decoder beam width of 128 and timestamps enabled. The client and the server were using audio chunks of the same duration (100ms, 800ms, 3200ms depending on the server configuration). The Jarvis streaming client jarvis_streaming_asr_client
, provided in the Jarvis client image was used with the --simulate_realtime
flag to simulate transcription from a microphone, where each stream was doing 5 iterations over a sample audio file from the Librispeech dataset (1272-135031-0000.wav). The command used was:
jarvis_streaming_asr_client --chunk_duration_ms=<chunk_duration> --simulate_realtime=true --automatic_punctuation=true --num_parallel_requests=<num_streams> --word_time_offsets=true --print_transcripts=false --interim_results=false --num_iterations=<5*num_streams> --audio_file=1272-135031-0000.wav --output_filename=/tmp/output.json;
NVIDIA A100 GPU¶
100ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
9.6 |
9.3 |
10.4 |
11.4 |
16.7 |
1 |
quartznet |
8 |
15.3 |
15.1 |
17.7 |
19.1 |
30.8 |
8 |
quartznet |
16 |
25.9 |
25.8 |
30.1 |
33.4 |
48.7 |
16 |
quartznet |
32 |
40.8 |
41.5 |
47.4 |
50.1 |
68.5 |
32 |
quartznet |
48 |
54.4 |
53.8 |
64.2 |
67.9 |
90.2 |
47.9 |
quartznet |
64 |
63.3 |
64.2 |
80.5 |
84.8 |
107.4 |
63.8 |
quartznet |
96 |
86.2 |
93.4 |
108.5 |
115.6 |
160.7 |
95.7 |
quartznet |
128 |
132.4 |
135.9 |
176 |
185.5 |
212.6 |
127.5 |
jasper |
1 |
13.4 |
13.1 |
14.3 |
15.2 |
20.5 |
1 |
jasper |
8 |
17.8 |
17.6 |
20.5 |
22.3 |
34.3 |
8 |
jasper |
16 |
26.3 |
24.3 |
34.8 |
36.6 |
47 |
16 |
jasper |
32 |
49.9 |
49.6 |
57.4 |
61.8 |
81.1 |
31.9 |
jasper |
48 |
60.8 |
61 |
72.3 |
75.5 |
87.6 |
47.9 |
jasper |
64 |
72.3 |
75.9 |
87.8 |
90.9 |
118.1 |
63.9 |
jasper |
96 |
114.5 |
117.7 |
155.3 |
173.1 |
190.4 |
95.7 |
jasper |
128 |
258.9 |
240 |
338.2 |
353.2 |
385 |
127.4 |
800ms chunk¶
Acoustic Model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
14.4 |
14 |
18.1 |
18.5 |
19.2 |
1 |
quartznet |
64 |
82.8 |
81.4 |
109 |
114.8 |
124.3 |
63.9 |
quartznet |
128 |
143.4 |
148.4 |
187.6 |
199.4 |
211.5 |
127.5 |
quartznet |
256 |
228.9 |
238.4 |
322.9 |
339.9 |
364.8 |
254.3 |
quartznet |
384 |
298.4 |
313 |
406.2 |
444 |
471.3 |
380.6 |
quartznet |
512 |
351.2 |
359.2 |
482.7 |
513.5 |
550.2 |
506.4 |
quartznet |
768 |
467.3 |
472.9 |
645.6 |
684.8 |
732.1 |
757.2 |
quartznet |
1024 |
630.8 |
607.2 |
961.1 |
1115.1 |
1318.1 |
1005.3 |
jasper |
1 |
17.6 |
16.8 |
21.6 |
23.8 |
26.8 |
1 |
jasper |
64 |
92.8 |
92.3 |
118.3 |
125.9 |
145.4 |
63.8 |
jasper |
128 |
156.8 |
160.9 |
205.7 |
223.7 |
243.1 |
127.5 |
jasper |
256 |
244.9 |
254.1 |
324.8 |
356.2 |
378.1 |
254.1 |
jasper |
384 |
311.1 |
315.7 |
411.7 |
435.9 |
474.4 |
380.7 |
jasper |
512 |
381 |
387.2 |
510.8 |
537.8 |
614.4 |
506.6 |
jasper |
768 |
512.6 |
510.3 |
689.4 |
734.8 |
1110.5 |
757 |
jasper |
1024 |
749.3 |
696.7 |
1228.9 |
1430.7 |
1579 |
1004 |
3200ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
28.1 |
28.8 |
32.4 |
32.5 |
32.5 |
1 |
quartznet |
256 |
356.7 |
397.7 |
478.3 |
493.1 |
518.9 |
253.8 |
quartznet |
512 |
566.5 |
591.5 |
780.1 |
803.4 |
841.8 |
505.2 |
quartznet |
768 |
729.1 |
721.9 |
990.8 |
1030.3 |
1074.4 |
753.4 |
quartznet |
1024 |
899.3 |
937.7 |
1226 |
1315.2 |
1514 |
1000.1 |
quartznet |
1280 |
1052.1 |
1037.9 |
1537.7 |
1793.6 |
2100 |
1244.9 |
quartznet |
1512 |
1303.8 |
1301.7 |
1847.9 |
2149.6 |
2464.6 |
1460.2 |
jasper |
1 |
31 |
33.4 |
35 |
35.3 |
35.3 |
1 |
jasper |
256 |
422.1 |
451.1 |
548.4 |
568.1 |
583.5 |
253.6 |
jasper |
512 |
667.5 |
697.5 |
864.8 |
890.7 |
926.3 |
504.1 |
jasper |
768 |
865.4 |
898.6 |
1106.3 |
1143.5 |
1225.6 |
752.3 |
jasper |
1024 |
1089 |
1083.8 |
1480.4 |
1617.3 |
2038.3 |
997.2 |
jasper |
1280 |
1382.5 |
1386.3 |
2041.7 |
2380.1 |
2559.1 |
1237.2 |
jasper |
1512 |
1753.8 |
1735 |
2629.3 |
2779.8 |
2970.5 |
1448.8 |
NVIDIA V100 GPU¶
100ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
8.8 |
8.3 |
9.7 |
11.3 |
21.7 |
1 |
quartznet |
8 |
15 |
14 |
17 |
20.2 |
43 |
8 |
quartznet |
16 |
22.4 |
21.4 |
25.8 |
27.6 |
57.6 |
16 |
quartznet |
32 |
36.1 |
36.2 |
41.8 |
44.4 |
72.9 |
31.9 |
quartznet |
48 |
44.6 |
44.8 |
53 |
55.7 |
85.4 |
47.9 |
quartznet |
64 |
54.9 |
55.1 |
67 |
73.1 |
102.5 |
63.8 |
quartznet |
96 |
81.2 |
84.3 |
99.2 |
111.8 |
179.2 |
95.7 |
quartznet |
128 |
114.7 |
109.3 |
157.3 |
181.5 |
228.2 |
127.4 |
jasper |
1 |
21.5 |
21 |
22.2 |
24 |
31.2 |
1 |
jasper |
8 |
27.6 |
26.5 |
29.7 |
34.7 |
53.4 |
8 |
jasper |
16 |
36.9 |
34 |
49 |
51.3 |
58.8 |
16 |
jasper |
32 |
74.5 |
72.5 |
88.1 |
91.6 |
126.3 |
31.9 |
jasper |
48 |
117.5 |
101.1 |
175.4 |
186.6 |
224.5 |
47.9 |
jasper |
64 |
406.4 |
365.7 |
645.5 |
695.1 |
806.5 |
63.6 |
jasper |
96 |
14378 |
13737 |
25542 |
27829 |
32182 |
72.8 |
jasper |
128 |
28826 |
28125 |
53029 |
56965 |
63537 |
66.2 |
800ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
14.4 |
13.6 |
20.4 |
20.6 |
20.7 |
1.0 |
quartznet |
64 |
79.3 |
77.2 |
111.3 |
120.2 |
130.1 |
63.8 |
quartznet |
128 |
135.1 |
128.9 |
195.7 |
204.9 |
219.0 |
127.4 |
quartznet |
256 |
222.2 |
218.7 |
315.2 |
339.2 |
362.2 |
254.3 |
quartznet |
384 |
310.9 |
304.9 |
443.8 |
479.9 |
520.5 |
380.3 |
quartznet |
512 |
385.2 |
374.5 |
569.0 |
589.6 |
626.2 |
505.4 |
quartznet |
768 |
574.5 |
527.0 |
937.3 |
1226.6 |
1347.8 |
751.9 |
quartznet |
1024 |
1088.1 |
946.2 |
1752.3 |
2116.6 |
2544.2 |
981.6 |
jasper |
1 |
26.8 |
25.9 |
32.8 |
35.3 |
56.6 |
1.0 |
jasper |
64 |
138.3 |
134.0 |
170.8 |
181.5 |
203.3 |
63.8 |
jasper |
128 |
239.4 |
234.9 |
294.9 |
310.2 |
342.8 |
127.2 |
jasper |
256 |
416.0 |
416.8 |
509.2 |
556.0 |
588.2 |
253.3 |
jasper |
384 |
613.6 |
597.9 |
766.6 |
919.4 |
1271.1 |
378.0 |
jasper |
512 |
969.7 |
858.2 |
1503.9 |
1860.3 |
2297.8 |
499.7 |
jasper |
768 |
9170.1 |
9241.0 |
15868.0 |
16618.0 |
18224.0 |
591.1 |
jasper |
1024 |
22837.0 |
23248.0 |
37553.0 |
40249.0 |
42696.0 |
579.8 |
3200ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
32.933 |
35.423 |
37.712 |
38.012 |
38.012 |
0.9994 |
quartznet |
256 |
461.44 |
488.88 |
630.67 |
653.84 |
684.75 |
253.1 |
quartznet |
512 |
784.73 |
843.69 |
1069.8 |
1105.7 |
1154.2 |
501.66 |
quartznet |
768 |
1121.6 |
1114.7 |
1601.7 |
1971.7 |
2138.5 |
747.45 |
quartznet |
1024 |
1551.5 |
1592.9 |
2258.9 |
2463.8 |
2608.1 |
985.6 |
quartznet |
1280 |
1982.2 |
2080.8 |
2910.2 |
3062.1 |
3279.6 |
1211.7 |
quartznet |
1512 |
2305.8 |
2241.4 |
3625.4 |
4190.5 |
4989.9 |
1413.3 |
jasper |
1 |
48.351 |
49.407 |
51.954 |
79.174 |
79.174 |
0.99919 |
jasper |
256 |
734.99 |
751.2 |
897.03 |
916.36 |
941.26 |
252.12 |
jasper |
512 |
1423.3 |
1384.4 |
2263.9 |
2387.1 |
2477.4 |
497.69 |
jasper |
768 |
2190.2 |
2133.8 |
3255.7 |
3393 |
3482.7 |
730.15 |
jasper |
1024 |
3576.3 |
2847.7 |
5861.6 |
6062.2 |
6748.6 |
951.97 |
jasper |
1280 |
13698 |
12101 |
28644 |
32940 |
35311 |
1001.1 |
jasper |
1512 |
19705 |
16730 |
40679 |
43397 |
46270 |
1014.6 |
NVIDIA T4¶
100ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
19.2 |
18.4 |
21.6 |
23.0 |
38.4 |
1.0 |
quartznet |
8 |
36.0 |
34.4 |
41.4 |
45.9 |
82.7 |
8.0 |
quartznet |
16 |
56.4 |
54.8 |
66.0 |
70.6 |
113.9 |
16.0 |
quartznet |
32 |
70.9 |
71.0 |
82.4 |
93.7 |
160.0 |
31.9 |
quartznet |
48 |
99.0 |
96.5 |
128.0 |
152.7 |
210.8 |
47.8 |
quartznet |
64 |
242.4 |
224.1 |
354.0 |
407.2 |
479.6 |
63.7 |
quartznet |
96 |
24151.0 |
22486.0 |
42624.0 |
47420.0 |
50429.0 |
58.7 |
quartznet |
128 |
43821.0 |
44736.0 |
77326.0 |
81324.0 |
87343.0 |
53.7 |
jasper |
1 |
46.9 |
46.9 |
49.6 |
52.7 |
65.7 |
1.0 |
jasper |
8 |
51.1 |
51.7 |
58.6 |
66.0 |
95.9 |
8.0 |
jasper |
16 |
84.4 |
81.7 |
97.3 |
104.1 |
187.7 |
16.0 |
jasper |
32 |
2328.1 |
2017.9 |
4183.5 |
5180.6 |
7012.1 |
31.6 |
jasper |
48 |
16858.0 |
14761.0 |
32993.0 |
35911.0 |
38084.0 |
35.1 |
jasper |
64 |
25504.0 |
22164.0 |
47484.0 |
51189.0 |
55003.0 |
37.0 |
jasper |
96 |
38857.0 |
41576.0 |
59410.0 |
63763.0 |
69797.0 |
38.2 |
jasper |
128 |
55384.0 |
57791.0 |
89744.0 |
94712.0 |
98622.0 |
38.7 |
800ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
33.183 |
33.444 |
44.144 |
44.813 |
46.354 |
0.99914 |
quartznet |
64 |
162.63 |
162.72 |
214.48 |
226.93 |
253.69 |
63.725 |
quartznet |
128 |
263.6 |
263.68 |
334.96 |
353.4 |
375.9 |
127.11 |
quartznet |
256 |
449.28 |
447.25 |
559.87 |
591.62 |
644.3 |
252.7 |
quartznet |
384 |
732.75 |
682.62 |
986.42 |
1360.7 |
1539.3 |
375.95 |
quartznet |
512 |
2037.5 |
2001.9 |
3136.3 |
3815.6 |
4684.4 |
487.93 |
quartznet |
768 |
15721 |
15724 |
27569 |
28450 |
29961 |
493.95 |
quartznet |
1024 |
29223 |
29487 |
49967 |
51824 |
53910 |
494.05 |
jasper |
1 |
72.377 |
72.143 |
82.132 |
89.374 |
90.067 |
0.99848 |
jasper |
64 |
259.64 |
262.21 |
298.47 |
311.66 |
331.8 |
63.62 |
jasper |
128 |
450.81 |
452.22 |
529.64 |
547.49 |
584.69 |
126.62 |
jasper |
256 |
1200.8 |
978.29 |
1809.4 |
2446.7 |
3595.1 |
249.24 |
jasper |
384 |
11679 |
11833 |
19190 |
20312 |
22493 |
279.91 |
jasper |
512 |
23750 |
23537 |
39610 |
41101 |
43670 |
280.41 |
jasper |
768 |
46165 |
49046 |
74417 |
79363 |
83407 |
279.8 |
jasper |
1024 |
67973 |
69939 |
114000 |
121000 |
126000 |
280.61 |
3200ms chunk¶
Acoustic model |
# of streams |
Latency (ms) |
Throughput (RTFX) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
quartznet |
1 |
157.62 |
160.64 |
168.29 |
168.31 |
168.31 |
0.99726 |
quartznet |
256 |
906.17 |
915.19 |
1098.4 |
1130.8 |
1163.2 |
251.35 |
quartznet |
512 |
1515.2 |
1491.2 |
2244.4 |
2429.9 |
2540.8 |
494.82 |
quartznet |
768 |
2398.4 |
2216.6 |
3447 |
3586.4 |
3909.8 |
722.55 |
quartznet |
1024 |
4636.2 |
4727.7 |
7782.6 |
8737.9 |
8969.3 |
926.66 |
quartznet |
1280 |
17263 |
15966 |
36103 |
40196 |
44408 |
872.88 |
quartznet |
1512 |
25038 |
24528 |
49704 |
56065 |
60136 |
875.68 |
jasper |
1 |
96.201 |
100.64 |
104.75 |
104.82 |
104.82 |
0.99831 |
jasper |
256 |
1758.4 |
1668.3 |
2718.5 |
2764.3 |
2811.6 |
247.1 |
jasper |
512 |
11593 |
9623.5 |
25483 |
28937 |
30681 |
432.78 |
jasper |
768 |
28073 |
27499 |
55288 |
57262 |
63169 |
440.06 |
jasper |
1024 |
44405 |
44756 |
83588 |
86835 |
92653 |
445.39 |
jasper |
1280 |
61336 |
65536 |
114000 |
117000 |
126000 |
446.78 |
jasper |
1512 |
76306 |
83556 |
140000 |
145000 |
153000 |
447.83 |
NLP¶
Performance of the Jarvis named entity recognition (NER) service (using a BERT-base model, sequence length of 128) and the Jarvis question answering (QA) service (using a BERT-large model, sequence length of 384) was measured in Jarvis. Batch size 1 latency and maximum throughput were measured.
NVIDIA A100 GPU¶
Task |
# of streams |
Latency (ms) |
Throughput (seq/s) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
NER |
1 |
3.19 |
3.15 |
3.3 |
3.44 |
3.88 |
311.1 |
NER |
256 |
95.5 |
96.1 |
108 |
113 |
118 |
2548.8 |
Q&A |
1 |
4.95 |
4.83 |
5.25 |
5.36 |
5.77 |
201.2 |
Q&A |
128 |
279 |
290 |
294 |
308 |
321 |
453.1 |
NVIDIA V100 GPU¶
Task |
# of streams |
Latency (ms) |
Throughput (seq/s) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
NER |
1 |
4.87 |
4.84 |
5.07 |
5.11 |
5.29 |
204.2 |
NER |
256 |
135 |
135 |
154 |
160 |
164 |
1796.8 |
Q&A |
1 |
7.47 |
7.44 |
7.58 |
7.62 |
7.78 |
133.5 |
Q&A |
128 |
521 |
541 |
543 |
544 |
626 |
243.8 |
NVIDIA T4¶
Task |
# of streams |
Latency (ms) |
Throughput (seq/s) |
||||
---|---|---|---|---|---|---|---|
avg |
p50 |
p90 |
p95 |
p99 |
|||
NER |
1 |
9.31 |
9.19 |
9.94 |
10.2 |
11.1 |
106.7 |
NER |
256 |
255 |
265 |
282 |
285 |
289 |
960.2 |
Q&A |
1 |
11.5 |
11.3 |
11.4 |
11.4 |
11.5 |
86.9 |
Q&A |
128 |
571 |
582 |
672 |
684 |
768 |
223.1 |
TTS¶
Performance of the Jarvis text-to-speech (TTS) service was measured for different number of parallel streams. Each parallel stream performed 10 iterations over 10 input strings from the LJSpeech dataset. Latency to first audio chunk and latency between successive audio chunks and throughput were measured.
NVIDIA A100 GPU¶
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.06 |
0.06 |
0.06 |
0.06 |
0.04 |
0.04 |
0.04 |
0.04 |
19.5 |
4 |
0.48 |
0.67 |
0.71 |
0.78 |
0.03 |
0.05 |
0.06 |
0.11 |
37.0 |
6 |
0.69 |
0.89 |
0.94 |
1.06 |
0.03 |
0.05 |
0.07 |
0.10 |
41.8 |
8 |
0.88 |
1.10 |
1.15 |
1.25 |
0.03 |
0.06 |
0.07 |
0.10 |
45.8 |
10 |
1.06 |
1.21 |
1.26 |
1.43 |
0.03 |
0.06 |
0.08 |
0.09 |
48.7 |
NVIDIA V100 GPU¶
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.08 |
0.08 |
0.08 |
0.25 |
0.05 |
0.06 |
0.06 |
0.06 |
14.31 |
4 |
0.77 |
0.98 |
1.07 |
1.19 |
0.05 |
0.07 |
0.08 |
0.13 |
23.3 |
6 |
1.11 |
1.47 |
1.56 |
1.71 |
0.05 |
0.09 |
0.11 |
0.17 |
25.55 |
8 |
1.4 |
1.81 |
1.9 |
2.06 |
0.06 |
0.1 |
0.12 |
0.17 |
28.09 |
10 |
1.74 |
2.37 |
2.52 |
2.78 |
0.07 |
0.12 |
0.14 |
0.17 |
27.75 |
NVIDIA T4¶
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.12 |
0.12 |
0.12 |
0.12 |
0.07 |
0.07 |
0.07 |
0.07 |
11.17 |
4 |
1.02 |
1.37 |
1.43 |
1.52 |
0.07 |
0.11 |
0.13 |
0.19 |
17.14 |
6 |
1.59 |
2.05 |
2.15 |
2.32 |
0.07 |
0.12 |
0.15 |
0.25 |
18.16 |
8 |
2.13 |
2.59 |
2.71 |
2.88 |
0.08 |
0.14 |
0.18 |
0.26 |
18.83 |
10 |
2.55 |
3.42 |
3.65 |
4.03 |
0.1 |
0.2 |
0.24 |
0.34 |
18.37 |
When the server is under high load, requests might time out, as the server will not start inference for a new request until a previous request is completely generated so that inference slot can be freed. This is done to maximize throughput for the TTS service and allow for real-time interaction. NVIDIA does not recommend making more than 8-10 simultaneous requests with the models provided in Jarvis 1.0.0 beta.