Advanced Usage#
Persistent Storage#
The NIM service chart supports two optional persistent volume claims for storage that survives pod restarts and helm uninstall. Both are enabled by default in the Studio Voice chart and are annotated with helm.sh/resource-policy: keep so they are retained when the Helm release is removed.
When the end-to-end demo chart is used, configure these keys under nvidia-studio-voice-h4m-service in the values file. (For more information, refer to Common Helm Configuration).
For the operator, equivalent settings are available under spec.parameters.nimModelCache and spec.parameters.nimLogs in the NvidiaStudioVoiceMediaFunction custom resource. For details, refer to Operator Configuration.
Model Cache#
Caches NGC model artifacts locally so models are not re-downloaded on every deployment. When the chart creates the PVC, its name is <appName>-model-cache. The container mounts the model cache at /opt/nim/.cache by default (mountPath).
nimModelCache:
enabled: true
create: true
size: "10Gi"
storageClassName: ""
# If this PVC stays Pending, uncomment the next line and replace <storage-class> with an RWO-capable StorageClass from your cluster.
# storageClassName: <storage-class>
mountPath: "/opt/nim/.cache"
When model cache is enabled, NIM_CACHE_PATH is set to the configured mountPath on the NIM pod.
NIM Log Files#
Persists time-stamped NIM log files under the configured directory. When the chart creates the PVC, its name is <appName>-nim-logs. The default mount path is /workspace/nim-logs (mountPath).
nimLogs:
enabled: true
create: true
size: "5Gi"
storageClassName: ""
# If this PVC stays Pending, uncomment the next line and replace <storage-class> with an RWO-capable StorageClass from your cluster.
# storageClassName: <storage-class>
mountPath: "/workspace/nim-logs"
StorageClass#
By default, storageClassName is set to "", which uses the cluster’s default StorageClass. To use a specific StorageClass, set storageClassName to the name of an existing StorageClass in your cluster under the nimModelCache or nimLogs block.
If no default StorageClass is configured in your cluster and storageClassName is left empty, the PVC remains in Pending state. In that case, set storageClassName to a valid StorageClass from your cluster.
List StorageClasses in your cluster:
kubectl get storageclass
If a chart-managed PVC stays Pending, uncomment storageClassName in your values file (refer to the Model Cache and NIM Log Files examples) and set it to an RWO-capable class from that list. For example:
nimModelCache:
enabled: true
size: "10Gi"
storageClassName: <storage-class>
Using a Pre-Existing Persistent Volume Claim#
To attach a PVC that already exists in the namespace instead of creating one with the chart, set create: false. Ensure that a PVC named <appName>-model-cache or <appName>-nim-logs exists before deploying:
nimModelCache:
enabled: true
create: false
Note
The model cache persistent volume claim requires a StorageClass that supports ReadWriteOnce access mode. When using a shared filesystem, ensure only one pod writes to the cache concurrently.
Troubleshooting#
End-to-End Demo Chart and NIM Service Chart#
Symptom |
Likely Cause |
Fix |
|---|---|---|
|
Image pull secret missing or incorrect. |
|
Pod crash / NGC errors |
Model pull secret missing or invalid. |
|
Pod |
Node selector, GPU, or resource constraints. |
|
Pod |
Insufficient hugepages. |
|
No enhanced audio output (ST 2110) |
Multicast IP addresses or ports misaligned. |
Ensure “sender → NIM service → receiver IP address and port” chain is consistent. |
No enhanced audio output (NMOS) |
Receivers connected after sender. |
Re-link in order: connect NIM receiver first, then sender. |
Rivermax errors |
Rivermax license secret missing. |
|
Startup probe failures |
Model download slow or NGC key invalid. |
|
PVC |
No default StorageClass. |
Set |
Kubernetes Operator#
Symptom |
Likely Cause |
Fix |
|---|---|---|
|
Image pull secret missing or incorrect. |
Check |
Custom resource |
Invalid spec, missing secrets, or scheduling. |
|
NIM pod crash / NGC errors |
Model pull secret missing or invalid. |
|
Pod |
Node selector, GPU, or hugepages. |
|
Rivermax errors |
License secret missing. |
Confirm Rivermax license secret is mounted at |
CRD not found |
Operator chart not installed or failed. |
|
On Red Hat OpenShift, replace kubectl with oc.
See Also#
Configuration Reference — Full set of Helm values and operator custom resource fields.
Observability — Accessing pod logs and persistent log files.