Advanced Usage#

Persistent Storage#

The NIM service chart supports two optional persistent volume claims for storage that survives pod restarts and helm uninstall. Both are enabled by default in the Studio Voice chart and are annotated with helm.sh/resource-policy: keep so they are retained when the Helm release is removed.

When the end-to-end demo chart is used, configure these keys under nvidia-studio-voice-h4m-service in the values file. (For more information, refer to Common Helm Configuration).

For the operator, equivalent settings are available under spec.parameters.nimModelCache and spec.parameters.nimLogs in the NvidiaStudioVoiceMediaFunction custom resource. For details, refer to Operator Configuration.

Model Cache#

Caches NGC model artifacts locally so models are not re-downloaded on every deployment. When the chart creates the PVC, its name is <appName>-model-cache. The container mounts the model cache at /opt/nim/.cache by default (mountPath).

nimModelCache:
  enabled: true
  create: true
  size: "10Gi"
  storageClassName: ""
  # If this PVC stays Pending, uncomment the next line and replace <storage-class> with an RWO-capable StorageClass from your cluster.
  # storageClassName: <storage-class>
  mountPath: "/opt/nim/.cache"

When model cache is enabled, NIM_CACHE_PATH is set to the configured mountPath on the NIM pod.

NIM Log Files#

Persists time-stamped NIM log files under the configured directory. When the chart creates the PVC, its name is <appName>-nim-logs. The default mount path is /workspace/nim-logs (mountPath).

nimLogs:
  enabled: true
  create: true
  size: "5Gi"
  storageClassName: ""
  # If this PVC stays Pending, uncomment the next line and replace <storage-class> with an RWO-capable StorageClass from your cluster.
  # storageClassName: <storage-class>
  mountPath: "/workspace/nim-logs"

StorageClass#

By default, storageClassName is set to "", which uses the cluster’s default StorageClass. To use a specific StorageClass, set storageClassName to the name of an existing StorageClass in your cluster under the nimModelCache or nimLogs block.

If no default StorageClass is configured in your cluster and storageClassName is left empty, the PVC remains in Pending state. In that case, set storageClassName to a valid StorageClass from your cluster.

List StorageClasses in your cluster:

kubectl get storageclass

If a chart-managed PVC stays Pending, uncomment storageClassName in your values file (refer to the Model Cache and NIM Log Files examples) and set it to an RWO-capable class from that list. For example:

nimModelCache:
  enabled: true
  size: "10Gi"
  storageClassName: <storage-class>

Using a Pre-Existing Persistent Volume Claim#

To attach a PVC that already exists in the namespace instead of creating one with the chart, set create: false. Ensure that a PVC named <appName>-model-cache or <appName>-nim-logs exists before deploying:

nimModelCache:
  enabled: true
  create: false

Note

The model cache persistent volume claim requires a StorageClass that supports ReadWriteOnce access mode. When using a shared filesystem, ensure only one pod writes to the cache concurrently.

Troubleshooting#

End-to-End Demo Chart and NIM Service Chart#

Symptom

Likely Cause

Fix

ImagePullBackOff

Image pull secret missing or incorrect.

kubectl get secret <image.secret>.

Pod crash / NGC errors

Model pull secret missing or invalid.

kubectl get secret <ngc.secretName>; confirm key is NGC_API_KEY.

Pod Pending

Node selector, GPU, or resource constraints.

kubectl describe pod <pod>; check node labels and capacity.

Pod Pending

Insufficient hugepages.

kubectl describe node <node>; check hugepages availability.

No enhanced audio output (ST 2110)

Multicast IP addresses or ports misaligned.

Ensure “sender → NIM service → receiver IP address and port” chain is consistent.

No enhanced audio output (NMOS)

Receivers connected after sender.

Re-link in order: connect NIM receiver first, then sender.

Rivermax errors

Rivermax license secret missing.

kubectl get secret rivermax-license.

Startup probe failures

Model download slow or NGC key invalid.

kubectl logs deploy/<appName>; increase startup probe failureThreshold.

PVC Pending

No default StorageClass.

Set nimModelCache.storageClassName or create a matching StorageClass.

Kubernetes Operator#

Symptom

Likely Cause

Fix

ImagePullBackOff on controller or NIM pod

Image pull secret missing or incorrect.

Check imagePullSecrets / mediaFunction.imagePullSecrets; kubectl describe pod <pod>.

Custom resource Provisioned false

Invalid spec, missing secrets, or scheduling.

kubectl describe nvidiastudiovoicemediafunction <name> -n <namespace>; check operator logs.

NIM pod crash / NGC errors

Model pull secret missing or invalid.

kubectl get secret <spec.parameters.ngcSecret.secretName>; confirm key matches secretKey.

Pod Pending

Node selector, GPU, or hugepages.

kubectl describe pod <pod>; verify node labels and capacity.

Rivermax errors

License secret missing.

Confirm Rivermax license secret is mounted at /opt/mellanox/rivermax.

CRD not found

Operator chart not installed or failed.

helm status studio-voice-operator; reinstall chart.

On Red Hat OpenShift, replace kubectl with oc.

See Also#