Is this page helpful?

Support Matrix#

For production, Holoscan for Media is designed to run on NVIDIA-Certified Systems. For full hardware, networking, and cluster platform requirements, refer to the System Requirements for Holoscan for Media.

Models#

Model name	Model ID	Publisher
Studio Voice	`studio-voice`	NVIDIA

Optimized GPU Configurations#

GPU	Precision
L40S	FP16 + FP32
NVIDIA RTX PRO 6000 Blackwell Server Edition	FP16 + FP32

Latency#

End-to-end per-packet RTP audio latency through the Studio Voice NIM, measured at the NIM output against the source RTP timestamp (preserved through the pipeline) with PTP-synchronized clocks per SMPTE ST 2110-21:

GPU	Average Latency	Min	Max
NVIDIA RTX PRO 6000 Blackwell Server Edition	49.81 ms	49.81 ms	49.82 ms
L40S	50.03 ms	50.03 ms	50.04 ms

Measured over a 30-minute continuous run of 48 kHz mono audio at 1 ms packet time (1,800,000 packets per SKU), zero packet loss, full SMPTE ST 2110-21 compliance.

Note

The reported per-packet latency reflects the model’s 10 ms input-accumulation window—Studio Voice processes audio in 10 ms frames even though RTP packets arrive every 1 ms—together with algorithmic look-ahead and GPU inference. The 10 ms window sets the practical floor for end-to-end latency.

Note

Numbers reflect a single-node configuration with NUMA-aligned GPU and SR-IOV virtual function, no concurrent workloads on the GPU or the SR-IOV pool, and PTP-disciplined NICs. Multi-node deployments add network transit (typically tens of microseconds per hop), and GPU or SR-IOV contention can lift the average by a few milliseconds and widen the jitter band.

Per-Stream Capacity and Integrity#

Each Studio Voice NIM media function pod handles one bidirectional ST 2110-30 audio stream at 48 kHz mono with 1 ms packet time and 24-bit linear PCM, yielding an RTP payload bandwidth of approximately 1.15 Mbps per stream per direction (about 1.62 Mbps including RTP, UDP, IP, and Ethernet headers). Capacity is scaled horizontally by deploying additional Studio Voice CRs, each producing one pod with the GPU, CPU, and hugepages budget shown in the next section.

Across the 30-minute continuous run referenced in the Latency table, both SKUs sustained the expected 1000 packets per second with zero dropped packets, zero latency errors, and full SMPTE ST 2110-21 compliance.

Pod Resource Requirements#

The following resources are required per Studio Voice NIM media function pod:

Resource	Requirement
CPU	2 cores (request) / 4 cores (limit)
Memory	2 GiB (request) / 4 GiB (limit)
Hugepages (2 Mi)	256 Mi
NVIDIA GPU	1

Ensure the target node has sufficient resources and that the GPU and hugepages are configured before deploying. Node selector configuration is covered in each chart’s installation page.

Software#

NVIDIA Driver#

Prerequisite	Version	Reference
NVIDIA Graphics Driver for Linux	571.21+	NVIDIA Unix Drivers

Component Versions#

The Studio Voice NIM uses the following software components:

Component	Version
CUDA	12.8.1
cuDNN	9.7.1.26
TensorRT	10.9.0.34
Triton Inference Server	v2.56.0
DeepStream	8.0
NVIDIA Media Gateway	0.7.0