Performance Results#
The NVIDIA Active Speaker Detection NIM supports multiple concurrent inputs per GPU. The following performance data is representative and intended to give a general idea of performance expectations. Actual results might vary depending on hardware configuration, input characteristics, and workload. The table shows performance data for various concurrency levels on various GPUs with different sample inputs.
GPU |
Speaker Count |
Concurrency |
Average FPS |
|---|---|---|---|
RTX L40S |
1 |
1 |
72 |
1 |
2 |
58 |
|
1 |
4 |
29 |
|
1 |
8 |
18 |
|
2 |
1 |
75 |
|
2 |
2 |
53 |
|
2 |
4 |
29 |
|
2 |
8 |
19 |
|
3 |
1 |
73 |
|
3 |
2 |
56 |
|
3 |
4 |
33 |
|
3 |
8 |
17 |
|
RTX A10g |
1 |
1 |
58 |
1 |
2 |
44 |
|
1 |
4 |
22 |
|
1 |
8 |
13 |
|
2 |
1 |
60 |
|
2 |
2 |
41 |
|
2 |
4 |
26 |
|
2 |
8 |
13 |
|
3 |
1 |
54 |
|
3 |
2 |
33 |
|
3 |
4 |
18 |
|
3 |
8 |
11 |
|
RTX 4090 |
1 |
1 |
72 |
1 |
2 |
49 |
|
1 |
4 |
28 |
|
1 |
8 |
20 |
|
2 |
1 |
77 |
|
2 |
2 |
53 |
|
2 |
4 |
31 |
|
2 |
8 |
23 |
|
3 |
1 |
73 |
|
3 |
2 |
46 |
|
3 |
4 |
26 |
|
3 |
8 |
21 |
|
RTX 5090 |
1 |
1 |
69 |
1 |
2 |
54 |
|
1 |
4 |
26 |
|
1 |
8 |
16 |
|
2 |
1 |
73 |
|
2 |
2 |
50 |
|
2 |
4 |
30 |
|
2 |
8 |
17 |
|
3 |
1 |
70 |
|
3 |
2 |
53 |
|
3 |
4 |
26 |
|
3 |
8 |
15 |
The inference FPS is calculated by dividing the total number of frames in input video file by the total inference time in seconds (measured from the time the request is sent until complete output file is received by the client).
Note: Video extension operations significantly increase processing time and memory usage due to frame buffering.
For more information, refer to the Maxine NIM clients Github repository: NVIDIA-Maxine/nim-clients.