Support Matrix#
For production, Holoscan for Media is designed to run on NVIDIA-Certified Systems. For full hardware, networking, and cluster platform requirements, refer to the System Requirements for Holoscan for Media.
Models#
Model name |
Model ID |
Publisher |
|---|---|---|
Studio Voice |
|
NVIDIA |
Optimized GPU Configurations#
GPU |
Precision |
|---|---|
L40S |
FP16 + FP32 |
NVIDIA RTX PRO 6000 Blackwell Server Edition |
FP16 + FP32 |
Latency#
End-to-end per-packet RTP audio latency through the Studio Voice NIM, measured at the NIM output against the source RTP timestamp (preserved through the pipeline) with PTP-synchronized clocks per SMPTE ST 2110-21:
GPU |
Average Latency |
Min |
Max |
|---|---|---|---|
NVIDIA RTX PRO 6000 Blackwell Server Edition |
49.81 ms |
49.81 ms |
49.82 ms |
L40S |
50.03 ms |
50.03 ms |
50.04 ms |
Measured over a 30-minute continuous run of 48 kHz mono audio at 1 ms packet time (1,800,000 packets per SKU), zero packet loss, full SMPTE ST 2110-21 compliance.
Note
The reported per-packet latency reflects the model’s 10 ms input-accumulation window—Studio Voice processes audio in 10 ms frames even though RTP packets arrive every 1 ms—together with algorithmic look-ahead and GPU inference. The 10 ms window sets the practical floor for end-to-end latency.
Note
Numbers reflect a single-node configuration with NUMA-aligned GPU and SR-IOV virtual function, no concurrent workloads on the GPU or the SR-IOV pool, and PTP-disciplined NICs. Multi-node deployments add network transit (typically tens of microseconds per hop), and GPU or SR-IOV contention can lift the average by a few milliseconds and widen the jitter band.
Per-Stream Capacity and Integrity#
Each Studio Voice NIM media function pod handles one bidirectional ST 2110-30 audio stream at 48 kHz mono with 1 ms packet time and 24-bit linear PCM, yielding an RTP payload bandwidth of approximately 1.15 Mbps per stream per direction (about 1.62 Mbps including RTP, UDP, IP, and Ethernet headers). Capacity is scaled horizontally by deploying additional Studio Voice CRs, each producing one pod with the GPU, CPU, and hugepages budget shown in the next section.
Across the 30-minute continuous run referenced in the Latency table, both SKUs sustained the expected 1000 packets per second with zero dropped packets, zero latency errors, and full SMPTE ST 2110-21 compliance.
Pod Resource Requirements#
The following resources are required per Studio Voice NIM media function pod:
Resource |
Requirement |
|---|---|
CPU |
2 cores (request) / 4 cores (limit) |
Memory |
2 GiB (request) / 4 GiB (limit) |
Hugepages (2 Mi) |
256 Mi |
NVIDIA GPU |
1 |
Ensure the target node has sufficient resources and that the GPU and hugepages are configured before deploying. Node selector configuration is covered in each chart’s installation page.
Software#
NVIDIA Driver#
Prerequisite |
Version |
Reference |
|---|---|---|
NVIDIA Graphics Driver for Linux |
571.21+ |
Component Versions#
The Studio Voice NIM uses the following software components:
Component |
Version |
|---|---|
CUDA |
12.8.1 |
cuDNN |
9.7.1.26 |
TensorRT |
10.9.0.34 |
Triton Inference Server |
v2.56.0 |
DeepStream |
8.0 |
NVIDIA Media Gateway |
0.7.0 |