Performance
Contents
Performance¶
Evaluation Process¶
This section presents latency and throughput numbers of the Riva text-to-speech (TTS) service on different GPUs. Performance of the TTS service was measured for a different number of parallel streams. Each parallel stream performed 20 iterations over 10 input strings from the LJSpeech dataset. Latency to first audio chunk, latency between successive audio chunks, and throughput were measured. The FastPitch and HiFi-GAN, and Tacotron 2 and WaveGlow models were tested.
The command used to measure performance was:
riva_tts_perf_client \
--num_parallel_requests=<num_streams> \
--voice_name=English-US-Female-1 \
--num_iterations=<20*num_streams> \
--online=true \
--text_file=$test_file \
--write_output_audio=false
where test_file
is a path to file
ljs_audio_text_test_filelist_small.txt
.
Results¶
Latencies to first audio chunk, latencies between audio chunks, and throughput are reported in the following tables. Throughput is measured in RTFX (duration of audio generated / computation time).
For specifications of the hardware on which these measurements were collected, refer to the Hardware Specifications section.
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.020 |
0.022 |
0.022 |
0.023 |
0.003 |
0.004 |
0.004 |
0.006 |
161.731 |
4 |
0.036 |
0.048 |
0.054 |
0.064 |
0.005 |
0.008 |
0.010 |
0.012 |
372.505 |
6 |
0.042 |
0.059 |
0.065 |
0.076 |
0.006 |
0.010 |
0.011 |
0.014 |
483.976 |
8 |
0.055 |
0.075 |
0.080 |
0.092 |
0.007 |
0.011 |
0.013 |
0.016 |
527.248 |
10 |
0.062 |
0.084 |
0.087 |
0.097 |
0.007 |
0.012 |
0.014 |
0.017 |
530.111 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.045 |
0.048 |
0.048 |
0.050 |
0.026 |
0.028 |
0.028 |
0.029 |
30.867 |
4 |
0.202 |
0.286 |
0.302 |
0.323 |
0.017 |
0.028 |
0.031 |
0.037 |
81.782 |
6 |
0.281 |
0.369 |
0.388 |
0.438 |
0.019 |
0.030 |
0.034 |
0.043 |
95.791 |
8 |
0.357 |
0.454 |
0.471 |
0.507 |
0.020 |
0.032 |
0.037 |
0.046 |
105.224 |
10 |
0.426 |
0.512 |
0.542 |
0.651 |
0.021 |
0.034 |
0.038 |
0.050 |
111.804 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.032 |
0.036 |
0.037 |
0.040 |
0.003 |
0.005 |
0.005 |
0.006 |
121.527 |
4 |
0.047 |
0.061 |
0.067 |
0.079 |
0.006 |
0.010 |
0.012 |
0.017 |
302.308 |
6 |
0.063 |
0.086 |
0.092 |
0.110 |
0.008 |
0.015 |
0.017 |
0.020 |
348.049 |
8 |
0.084 |
0.110 |
0.116 |
0.132 |
0.009 |
0.017 |
0.020 |
0.024 |
367.742 |
10 |
0.095 |
0.125 |
0.131 |
0.149 |
0.010 |
0.018 |
0.021 |
0.025 |
371.032 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.059 |
0.066 |
0.067 |
0.069 |
0.028 |
0.034 |
0.034 |
0.035 |
26.654 |
4 |
0.255 |
0.350 |
0.368 |
0.413 |
0.025 |
0.037 |
0.043 |
0.055 |
61.693 |
6 |
0.382 |
0.501 |
0.527 |
0.575 |
0.029 |
0.045 |
0.050 |
0.071 |
68.254 |
8 |
0.508 |
0.666 |
0.696 |
0.756 |
0.032 |
0.051 |
0.059 |
0.077 |
72.058 |
10 |
0.631 |
0.761 |
0.798 |
0.967 |
0.033 |
0.052 |
0.061 |
0.088 |
74.592 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.025 |
0.028 |
0.028 |
0.029 |
0.005 |
0.006 |
0.006 |
0.006 |
113.982 |
4 |
0.050 |
0.073 |
0.082 |
0.096 |
0.008 |
0.014 |
0.017 |
0.023 |
260.055 |
6 |
0.082 |
0.120 |
0.136 |
0.161 |
0.010 |
0.021 |
0.025 |
0.029 |
262.642 |
8 |
0.121 |
0.167 |
0.180 |
0.210 |
0.012 |
0.024 |
0.027 |
0.033 |
265.144 |
10 |
0.141 |
0.193 |
0.211 |
0.239 |
0.012 |
0.025 |
0.028 |
0.034 |
272.279 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.073 |
0.076 |
0.076 |
0.079 |
0.047 |
0.049 |
0.050 |
0.050 |
17.302 |
4 |
0.376 |
0.504 |
0.530 |
0.589 |
0.031 |
0.048 |
0.057 |
0.071 |
44.750 |
6 |
0.562 |
0.733 |
0.774 |
0.840 |
0.035 |
0.058 |
0.068 |
0.091 |
48.801 |
8 |
0.733 |
0.936 |
0.989 |
1.075 |
0.038 |
0.063 |
0.074 |
0.105 |
52.819 |
10 |
0.885 |
1.105 |
1.201 |
1.399 |
0.042 |
0.071 |
0.088 |
0.126 |
54.508 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.055 |
0.061 |
0.062 |
0.068 |
0.005 |
0.007 |
0.008 |
0.010 |
73.647 |
4 |
0.087 |
0.108 |
0.115 |
0.138 |
0.012 |
0.022 |
0.026 |
0.032 |
158.731 |
6 |
0.121 |
0.159 |
0.171 |
0.201 |
0.017 |
0.033 |
0.037 |
0.045 |
169.916 |
8 |
0.161 |
0.211 |
0.226 |
0.263 |
0.020 |
0.039 |
0.043 |
0.053 |
177.233 |
10 |
0.190 |
0.253 |
0.273 |
0.312 |
0.022 |
0.041 |
0.047 |
0.058 |
178.790 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.099 |
0.106 |
0.107 |
0.118 |
0.043 |
0.052 |
0.053 |
0.054 |
17.250 |
4 |
0.564 |
0.772 |
0.829 |
0.893 |
0.063 |
0.102 |
0.120 |
0.162 |
26.446 |
6 |
0.914 |
1.219 |
1.287 |
1.420 |
0.077 |
0.126 |
0.149 |
0.210 |
27.500 |
8 |
1.295 |
1.662 |
1.754 |
1.892 |
0.085 |
0.142 |
0.177 |
0.249 |
27.917 |
10 |
1.625 |
1.994 |
2.108 |
2.545 |
0.092 |
0.162 |
0.198 |
0.280 |
28.192 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.065 |
0.067 |
0.068 |
0.069 |
0.019 |
0.020 |
0.020 |
0.021 |
18.106 |
# of streams |
Latency to first audio (s) |
Latency between audio chunks (s) |
Throughput (RTFX) |
||||||
---|---|---|---|---|---|---|---|---|---|
avg |
p90 |
p95 |
p99 |
avg |
p90 |
p95 |
p99 |
||
1 |
0.084 |
0.085 |
0.086 |
0.104 |
0.019 |
0.020 |
0.020 |
0.021 |
13.954 |
Hardware Specifications¶
GPU |
|
---|---|
NVIDIA DGX A100 40 GB |
|
CPU |
|
Model |
AMD EPYC 7742 64-Core Processor |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
64 |
NUMA node(s) |
8 |
Frequency boost |
enabled |
CPU max MHz |
2250 |
CPU min MHz |
1500 |
RAM |
|
Model |
Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz |
Configured Memory Speed |
2933 MT/s |
RAM Size |
32x64GB (2048GB Total) |
GPU |
|
---|---|
NVIDIA A30 |
|
CPU |
|
Model |
AMD EPYC 7742 64-Core Processor |
Thread(s) per core |
1 |
Socket(s) |
2 |
Core(s) per socket |
64 |
NUMA node(s) |
2 |
Frequency boost |
disabled |
CPU max MHz |
2250.0000 |
CPU min MHz |
1500.0000 |
RAM |
|
Model |
Samsung DDR4 M393A4K40DB3-CWE 3200MHz |
Configured Memory Speed |
3200 MT/s |
RAM Size |
32x64GB (2048GB Total) |
GPU |
|
---|---|
NVIDIA V100 SXM2 16 GB |
|
CPU |
|
Model |
Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
20 |
NUMA node(s) |
2 |
CPU max MHz |
3600 |
CPU min MHz |
1200 |
RAM |
|
Model |
Micron DDR4 36ASF4G72PZ-2G6D1 2667MHz |
Configured Memory Speed |
2133 MT/s |
RAM Size |
16x32GB (512GB Total) |
GPU |
|
---|---|
NVIDIA T4 |
|
CPU |
|
Model |
Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz |
Thread(s) per core |
2 |
Socket(s) |
2 |
Core(s) per socket |
18 |
NUMA node(s) |
2 |
CPU max MHz |
3900 |
CPU min MHz |
1000 |
RAM |
|
Model |
Samsung DDR4 M393A2K43BB1-CTD 2666MHz |
Configured Memory Speed |
2666 MT/s |
RAM Size |
24x16GB (384GB Total) |
Performance Considerations¶
When the server is under high load, requests might time out, as the server will not start inference for a new request until a previous request is completely generated so that inference slot can be freed. This is done to maximize throughput for the TTS service and allow for real-time interaction. NVIDIA does not recommend making more than 8-10 simultaneous requests with the models provided in Riva 2.0.0.