Support Matrix#

For production, Holoscan for Media is designed to run on NVIDIA-Certified Systems. For full hardware, networking, and cluster platform requirements, refer to the System Requirements for Holoscan for Media.

Models#

Model name

Model ID

Publisher

Studio Voice

studio-voice

NVIDIA

Optimized GPU Configurations#

GPU

Precision

L40S

FP16 + FP32

NVIDIA RTX PRO 6000 Blackwell Server Edition

FP16 + FP32

Latency#

End-to-end per-packet RTP audio latency through the Studio Voice NIM, measured at the NIM output against the source RTP timestamp (preserved through the pipeline) with PTP-synchronized clocks per SMPTE ST 2110-21:

GPU

Average Latency

Min

Max

NVIDIA RTX PRO 6000 Blackwell Server Edition

49.81 ms

49.81 ms

49.82 ms

L40S

50.03 ms

50.03 ms

50.04 ms

Measured over a 30-minute continuous run of 48 kHz mono audio at 1 ms packet time (1,800,000 packets per SKU), zero packet loss, full SMPTE ST 2110-21 compliance.

Note

The reported per-packet latency reflects the model’s 10 ms input-accumulation window—Studio Voice processes audio in 10 ms frames even though RTP packets arrive every 1 ms—together with algorithmic look-ahead and GPU inference. The 10 ms window sets the practical floor for end-to-end latency.

Note

Numbers reflect a single-node configuration with NUMA-aligned GPU and SR-IOV virtual function, no concurrent workloads on the GPU or the SR-IOV pool, and PTP-disciplined NICs. Multi-node deployments add network transit (typically tens of microseconds per hop), and GPU or SR-IOV contention can lift the average by a few milliseconds and widen the jitter band.

Per-Stream Capacity and Integrity#

Each Studio Voice NIM media function pod handles one bidirectional ST 2110-30 audio stream at 48 kHz mono with 1 ms packet time and 24-bit linear PCM, yielding an RTP payload bandwidth of approximately 1.15 Mbps per stream per direction (about 1.62 Mbps including RTP, UDP, IP, and Ethernet headers). Capacity is scaled horizontally by deploying additional Studio Voice CRs, each producing one pod with the GPU, CPU, and hugepages budget shown in the next section.

Across the 30-minute continuous run referenced in the Latency table, both SKUs sustained the expected 1000 packets per second with zero dropped packets, zero latency errors, and full SMPTE ST 2110-21 compliance.

Pod Resource Requirements#

The following resources are required per Studio Voice NIM media function pod:

Resource

Requirement

CPU

2 cores (request) / 4 cores (limit)

Memory

2 GiB (request) / 4 GiB (limit)

Hugepages (2 Mi)

256 Mi

NVIDIA GPU

1

Ensure the target node has sufficient resources and that the GPU and hugepages are configured before deploying. Node selector configuration is covered in each chart’s installation page.

Software#

NVIDIA Driver#

Prerequisite

Version

Reference

NVIDIA Graphics Driver for Linux

571.21+

NVIDIA Unix Drivers

Component Versions#

The Studio Voice NIM uses the following software components:

Component

Version

CUDA

12.8.1

cuDNN

9.7.1.26

TensorRT

10.9.0.34

Triton Inference Server

v2.56.0

DeepStream

8.0

NVIDIA Media Gateway

0.7.0