Getting Started#

Holoscan for Media#

Platform setup, cluster prerequisites, reference applications, deployment, verification, monitoring, application development, troubleshooting, and support for Holoscan for Media are covered in the NVIDIA Holoscan for Media user guide.


Prerequisites#

Before proceeding, check the Support Matrix to verify that the required hardware and software are available.

Requirement

Details

Reference

Kubernetes cluster

Kubernetes or Red Hat OpenShift.

Support Matrix

OpenShift setup

Create the target namespace and grant SCC permissions to each service account your workloads use (often default).

OpenShift Security Context Constraint

Namespace scope

NetworkAttachmentDefinition resourcesare namespace-scoped—copy them into the deployment namespace when they exist in another namespace (for example, default).

Namespace Scope

NGC account and API key

Required for CLI authentication, image and chart pulls, and creating image pull and model pull secrets.

NGC Authentication

Rivermax license

Required for SMPTE ST 2110 streaming.

Rivermax License

High-speed network attachment

A NetworkAttachmentDefinition for SMPTE ST 2110 provisioned by the cluster administrator.

High-Speed Network Configuration

OpenShift Security Context Constraint

An SCC that grants the NIM the four Linux capabilities and allowPrivilegeEscalation: true.

Security Context

Topology-aware scheduler (when applicable)

Required on clusters with topologyManagerPolicy: single-numa-node so that sender, NIM, and receiver pods can co-locate the GPU, SR-IOV virtual function, and CPU on the same NUMA zone.

Scheduler and NUMA Topology

NMOS registry (optional)

Required only when using NMOS for dynamic stream connection management.

Chrome Remote Desktop (when using NMOS)

Required to reach the NMOS Connection Manager UI in Holoscan for Media setups.

Chrome Remote Desktop

Access NGC and Helm Charts#

An NGC account and an NGC API key are required to authenticate the CLI, pull container images and Helm charts from NVIDIA registries, and create cluster pull secrets as described in License and Secrets Management.

NGC Authentication#

Generate an API Key#

An NGC API key is required to access NGC resources. You can generate a key at https://org.ngc.nvidia.com/setup/api-keys.

When creating an NGC API Personal key, ensure that at least NGC Catalog is selected from the Services Included dropdown. You can include more services if this key is to be reused for other purposes.

Note

Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, refer to NGC API Keys in the NGC User Guide.

Export the NGC API Key#

To pull container images and model artifacts from NGC, use the NGC_API_KEY environment variable when authenticating with the Helm registry and creating cluster secrets as described in the following sections.

The simplest way to create the NGC_API_KEY environment variable is to export it in your terminal:

export NGC_API_KEY=<value>

Run one of the following commands to make the key available at startup:

# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc

# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Note

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

Log In to the NGC Registry#

Log in to the NGC container and Helm registry before pulling charts or images:

helm registry login nvcr.io --username '$oauthtoken' --password $NGC_API_KEY

License and Secrets Management#

Rivermax License#

Rivermax is an optimized networking SDK that uses ConnectX and BlueField DPU hardware-streaming acceleration and supports GPUDirect, in both bare-metal and virtualized environments. Rivermax is required by Holoscan for Media applications to comply with the timing and traffic flow requirements of SMPTE ST 2110.

Request a development license at Rivermax Download. When the license file is available, create a Kubernetes secret from it:

kubectl create secret generic rivermax-license \
  --from-file=rivermax.lic \
  -n <your-namespace>

The secret is mounted at /opt/mellanox/rivermax in the NIM service pod, and (when deploying the end-to-end demo chart) in the sender and receiver pods. Create the secret in the deployment namespace before installing any of the three charts; otherwise, pods stay in ContainerCreating with the error message MountVolume.SetUp failed for volume "rivermax" : secret "rivermax-license" not found.

On Red Hat OpenShift, replace kubectl with oc.

Image Pull Secret#

Create a secret so the cluster can pull images from nvcr.io:

kubectl create secret docker-registry <secret-name> \
  --docker-server=nvcr.io \
  '--docker-username=$oauthtoken' \
  --docker-password=<NGC-API-KEY> \
  --docker-email=<your-email> \
  -n <your-namespace>

On Red Hat OpenShift, replace kubectl with oc.

  • <secret-name> — Name for the image pull secret.

  • <NGC-API-KEY> — The NGC API key for the org that hosts the images.

  • <your-email> — The email address associated with the NGC account.

  • <your-namespace> — The Kubernetes namespace in which applications are deployed; often default when getting started.

Model Pull Secret#

Model artifacts are downloaded by the NIM at runtime using an API key carried in a generic secret, separate from the image pull secret discussed earlier:

kubectl create secret generic <secret-name> \
  --from-literal=NGC_API_KEY=<NGC-API-KEY> \
  -n <your-namespace>

Example:

kubectl create secret generic ngc-model-pull-api-key \
  --from-literal=NGC_API_KEY=MY-NGC-API-KEY

On Red Hat OpenShift, replace kubectl with oc.

  • <secret-name> — Name for the model pull secret.

  • <NGC-API-KEY> — The NGC API key for the org that hosts model artifacts.


High-Speed Network Configuration#

SMPTE ST 2110 media transport uses a dedicated high-speed network attachment provisioned on the cluster. The network attachment name must match the NetworkAttachmentDefinition configured by your cluster administrator.

Set the network attachment name in the Helm values for each component in the pipeline. A single attachment is configured as follows:

highSpeedNetwork:
  name: "media-a-tx-net"

For multiple network attachments, use the list format:

highSpeedNetwork:
  - name: "media-a-tx-net"
  - name: "media-b-tx-net-static"
    ip: "198.51.100.3/24"

For networks with static IP address management, the ip property is required in addition to the network name.

Note

To ensure correct media routing, the network attachment name must be the same across all components in the pipeline—sender, NIM service, and receiver.

Namespace Scope#

NetworkAttachmentDefinition resources are namespace-scoped, and Multus looks them up in the same namespace as the pod. Every NAD referenced by the chart must therefore exist in the namespace where Studio Voice is installed (the sample chart references media-a-tx-net, media-a-rx-net, media-a-tx-net-static, and media-a-rx-net-static). On Holoscan for Media reference clusters, NADs are typically provisioned in the default namespace.

Check which NADs already exist in your deployment namespace:

kubectl get net-attach-def -n <your-namespace>

If any are missing, copy every NAD from default into the deployment namespace in one pass, stripping server-managed metadata so that the apply command is clean:

NS_DST=<your-namespace>

for nad in $(kubectl get net-attach-def -n default -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'); do
  kubectl get net-attach-def "$nad" -n default -o json | jq --arg ns "$NS_DST" '
    del(.metadata.uid,
        .metadata.resourceVersion,
        .metadata.creationTimestamp,
        .metadata.generation,
        .metadata.managedFields) |
    .metadata.namespace = $ns
  ' | kubectl apply -f -
done

On Red Hat OpenShift, replace kubectl with oc.

Note

If a referenced NAD is missing from the deployment namespace, helm install reports STATUS: deployed but no pods are created—admission rejects pod creation with the message could not find network attachment definition '<namespace>/media-a-tx-net'. Inspect with the command oc get events -n <your-namespace> --sort-by=.lastTimestamp.


Scheduler and NUMA Topology#

Each Studio Voice workload pod (sender, NIM service, and receiver) requests a GPU, an SR-IOV virtual function from a high-speed network pool, and a fixed CPU allocation. These resources perform best when co-located on the same NUMA zone. The chart sets schedulerName: topo-aware-scheduler on each workload so that NUMA alignment is computed at scheduling time. For this setting to take effect, the cluster needs a topology-aware scheduler registered under that name and a kubelet configured with topologyManagerPolicy: single-numa-node. On clusters without such a scheduler, override schedulerName to default-scheduler—pods will schedule successfully but cross-NUMA placement is possible, which produces slight per-packet latency variation from the Support Matrix figures while staying within the SMPTE ST 2110-21 envelope.

To override schedulerName on clusters without a topology-aware scheduler, use the matching knob for your deployment path:

  • NIM service chart: --set schedulerName=default-scheduler (or schedulerName: default-scheduler in your values file).

  • Sample (umbrella) chart: --set global.schedulerName=default-scheduler for a fleet-wide override, or set <component>.schedulerName per sender, receiver, and NIM service. Refer to Global Overrides.

  • Operator custom resource: Set spec.schedulerName: default-scheduler on the NvidiaStudioVoiceMediaFunction CR.

Note

The Holoscan for Media production automation provisions both the topology-aware scheduler and the kubelet topology-manager policy. On clusters not provisioned this way, install the NUMA Resources Operator (Red Hat OpenShift), which exposes a secondary scheduler named topo-aware-scheduler and NodeResourceTopology custom resources.

Verify Per-NUMA Capacity#

Before running helm install, confirm that a NUMA zone has both a free GPU and a free virtual function in the SR-IOV pool referenced by the chart’s highSpeedNetwork.name:

oc get noderesourcetopologies <node> -o yaml

In the output, find a zone whose resources list reports available >= 1 for both nvidia.com/gpu and the SR-IOV pool (for example, openshift.io/media_a_tx_pool). The following command determines which pool a given network attachment resolves to:

oc get net-attach-def <name> -n <namespace> \
  -o jsonpath='{.metadata.annotations.k8s\.v1\.cni\.cncf\.io/resourceName}{"\n"}'

Troubleshooting#

If a pod stays Pending with the following event, the topology-aware scheduler could not satisfy NUMA alignment on any node:

Warning  FailedScheduling  topo-aware-scheduler
  0/N nodes are available: 1 cannot align container.

Likely causes:

  • The NUMA zone hosting the requested SR-IOV pool has no free GPU. Use the capacity check shown earlier to find a zone with both a free nvidia.com/gpu and a free virtual function in the matching SR-IOV pool.

  • The kubelet on the target node is not set to topologyManagerPolicy: single-numa-node.

  • The cluster has no scheduler registered under the name topo-aware-scheduler. Verify with kubectl get pods -A | grep topo-aware-scheduler or check with the cluster administrator.

If you override schedulerName to default-scheduler (refer to the per-deployment-path knobs discussed earlier), the failure mode changes. The default scheduler ignores Topology Manager constraints, so on a NUMA-booked cluster the kubelet rejects the chosen node post-bind with TopologyAffinityError: Resources cannot be allocated with Topology locality rather than a FailedScheduling event. On NUMA-spacious clusters the pod schedules and runs, but cross-NUMA placement is possible, producing slight per-packet latency variation from the Support Matrix figures while staying within the SMPTE ST 2110-21 envelope.


Chrome Remote Desktop#

When Studio Voice is deployed with NMOS enabled, streams are connected and managed through the NMOS Connection Manager (a web UI). On Holoscan for Media clusters, access to that UI typically requires a browser that can reach the cluster network. Chrome Remote Desktop provides a remote graphical session for accessing that UI.

For information about installation and setup, refer to the Chrome Remote Desktop section of Manual Deployment in the Holoscan for Media user guide.


Security Context#

To operate correctly, the NIM container requires allowPrivilegeEscalation: true and the following Linux capabilities:

Capability

Purpose

IPC_LOCK

Lock memory pages; required for hugepage allocation by Rivermax.

NET_RAW

Use raw sockets; required by Rivermax for SMPTE ST 2110.

SYS_NICE

Set process and thread CPU affinity; required by Rivermax.

DAC_READ_SEARCH

Read files across permission boundaries during startup.

These values are set in the Helm values under containerSecurityContext or in the NIM operator custom resource under spec.parameters.securityContext. For details, refer to Configuration Reference.

OpenShift Security Context Constraint#

On Red Hat OpenShift, the default restricted-v2 Security Context Constraint denies allowPrivilegeEscalation: true and the capabilities mentioned in the previous section. By default, sender, NIM, and receiver pods run as the default ServiceAccount in the deployment namespace, so an SCC that permits these settings must be bound to that ServiceAccount. If your Helm values, operator custom resource, or pod templates set a different serviceAccountName, grant the SCC to that account instead. Use the following command to confirm which account a pod is using:

kubectl get pod <pod-name> -n <your-namespace> -o jsonpath='{.spec.serviceAccountName}{"\n"}'

An empty result means the namespace default ServiceAccount is used.

For development and evaluation, the simplest option is to bind the built-in privileged SCC to the deployment namespace:

oc adm policy add-scc-to-user privileged -z default -n <your-namespace>

If your pods use a different account, replace default with the actual ServiceAccount name.

For production, prefer a least-privilege SCC that grants only allowPrivilegeEscalation: true and the four capabilities mentioned in the previous section. Refer to Managing security context constraints in the OpenShift documentation.

Note

If the SCC binding is missing, pod creation fails with an admission error from the SCC validator. For example:

unable to validate against any security context constraint: provider "restricted-v2": Forbidden: ... capabilities.add: Invalid value: "IPC_LOCK": capability may not be added

Inspect with oc get events -n <your-namespace> --sort-by=.lastTimestamp.


See Also#

  • Installation — Demo, service, and operator Helm installation instructions.

  • Configuration Reference — Helm, operator, and environment variable tables with defaults and allowed values.