Performance

Evaluation Process#

This section presents latency and throughput numbers of the Riva text-to-speech (TTS) service on different GPUs. Performance of the TTS service was measured for a different number of parallel streams. Each parallel stream performed 20 iterations over 10 input strings from the LJSpeech dataset. Each stream sends a request to the Riva server and waits for all audio chunks to have been received before sending another request. Latency to first audio chunk, latency between successive audio chunks, and throughput were measured. The following diagram shows how the latencies are measured.

Schematic Diagram of Latencies Measured by Riva Streaming TTS Client

The FastPitch and HiFi-GAN models were tested.

The Riva TTS perf client riva_tts_perf_client, provided in the Riva image, was used to measure performance. The source code of the client can be obtained from https://github.com/nvidia-riva/cpp-clients.

The following command was used to generate the tables below:

riva_tts_perf_client \
    --num_parallel_requests=<num_streams> \
    --voice_name=English-US.Female-1 \
    --num_iterations=<20*num_streams> \
    --online=true \
    --text_file=$test_file \
    --write_output_audio=false

Where test_file is a path to the ljs_audio_text_test_filelist_small.txt file.

Results#

Latencies to first audio chunk, latencies between audio chunks, and throughput are reported in the following tables. Throughput is measured in RTFX (duration of audio generated / computation time).

For specifications of the hardware on which these measurements were collected, refer to the Hardware Specifications section.

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.021	0.023	0.024	0.025	0.003	0.004	0.004	0.004	147.083
4	0.041	0.061	0.067	0.078	0.005	0.008	0.010	0.013	327.195
6	0.060	0.082	0.087	0.098	0.006	0.010	0.012	0.015	366.147
8	0.071	0.095	0.102	0.117	0.008	0.012	0.015	0.019	402.996
10	0.079	0.106	0.114	0.132	0.008	0.013	0.015	0.019	423.031

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.022	0.024	0.025	0.027	0.004	0.005	0.005	0.006	127.978
4	0.048	0.067	0.075	0.087	0.007	0.011	0.013	0.018	266.844
6	0.084	0.111	0.118	0.144	0.008	0.014	0.017	0.020	269.692
8	0.102	0.133	0.140	0.155	0.009	0.017	0.019	0.025	302.212
10	0.118	0.153	0.161	0.180	0.009	0.018	0.021	0.027	316.384

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.022	0.024	0.024	0.025	0.004	0.004	0.005	0.005	131.226
4	0.055	0.077	0.083	0.094	0.008	0.013	0.016	0.021	235.152
6	0.090	0.121	0.127	0.139	0.009	0.017	0.019	0.024	247.030
8	0.118	0.154	0.164	0.188	0.010	0.019	0.023	0.031	259.844
10	0.137	0.180	0.190	0.219	0.011	0.021	0.024	0.034	265.490

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.026	0.028	0.028	0.029	0.005	0.006	0.006	0.006	101.499
4	0.061	0.090	0.100	0.114	0.009	0.015	0.017	0.026	207.091
6	0.105	0.144	0.153	0.169	0.010	0.017	0.020	0.027	213.771
8	0.129	0.175	0.188	0.219	0.012	0.019	0.023	0.029	228.952
10	0.148	0.197	0.206	0.227	0.013	0.020	0.023	0.029	239.188

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.028	0.030	0.031	0.039	0.007	0.008	0.009	0.010	85.266
4	0.085	0.123	0.131	0.146	0.015	0.026	0.031	0.046	133.369
6	0.139	0.195	0.208	0.230	0.019	0.031	0.035	0.047	138.631
8	0.194	0.258	0.276	0.308	0.022	0.037	0.041	0.052	142.243
10	0.227	0.305	0.324	0.354	0.025	0.040	0.045	0.057	143.640

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.064	0.067	0.068	0.068	0.001	0.001	0.001	0.018	21.621

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.097	0.102	0.103	0.103	0.002	0.002	0.003	0.028	14.199

# of streams	Latency to first audio (s)				Latency between audio chunks (s)				Throughput (RTFX)
	avg	p90	p95	p99	avg	p90	p95	p99
1	0.032	0.032	0.033	0.134	0.004	0.005	0.005	0.006	38.249

Hardware Specifications#

GPU
NVIDIA DGX A100 40 GB
CPU
Model	AMD EPYC 7742 64-Core Processor
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	2250
CPU min MHz	1500
RAM
Model	Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz
Configured Memory Speed	2933 MT/s
RAM Size	32x64GB (2048GB Total)

GPU
NVIDIA A30
CPU
Model	AMD EPYC 7742 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	2
Frequency boost	disabled
CPU max MHz	2250.0000
CPU min MHz	1500.0000
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed	3200 MT/s
RAM Size	32x64GB (2048GB Total)

GPU
NVIDIA A10
CPU
Model	AMD EPYC 7763 64-Core Processor
Thread(s) per core	1
Socket(s)	2
Core(s) per socket	64
NUMA node(s)	8
Frequency boost	enabled
CPU max MHz	2450
CPU min MHz	1500
RAM
Model	Samsung DDR4 M393A4K40DB3-CWE 3200 MHz
Configured Memory Speed	3200 MT/s
RAM Size	16x32GB (512GB Total)

GPU
NVIDIA V100 SXM2 16 GB
CPU
Model	Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	20
NUMA node(s)	2
CPU max MHz	3600
CPU min MHz	1200
RAM
Model	Micron DDR4 36ASF4G72PZ-2G6D1 2667MHz
Configured Memory Speed	2133 MT/s
RAM Size	16x32GB (512GB Total)

GPU
NVIDIA T4
CPU
Model	Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Thread(s) per core	2
Socket(s)	2
Core(s) per socket	18
NUMA node(s)	2
CPU max MHz	3900
CPU min MHz	1000
RAM
Model	Samsung DDR4 M393A2K43BB1-CTD 2666MHz
Configured Memory Speed	2666 MT/s
RAM Size	24x16GB (384GB Total)

Performance Considerations#

When the server is under high load, requests might time out, as the server will not start inference for a new request until a previous request is completely generated so that inference slot can be freed. This is done to maximize throughput for the TTS service and allow for real-time interaction.

NVIDIA Riva

Contents

Performance#

Evaluation Process#

Results#

Hardware Specifications#

Performance Considerations#