Advanced Usage#
Persistent Storage#
The NIM service chart supports two optional persistent volume claims for storage that survives pod restarts and helm uninstall. Both are disabled by default and are annotated with helm.sh/resource-policy: keep so they are retained when the Helm release is removed.
When the end-to-end demo chart is used, configure these keys under nvidia-active-speaker-detection-h4m-service in the values file. (For more information, refer to Common Helm Configuration).
For the operator, equivalent settings are available under spec.parameters.nimModelCache and spec.parameters.nimLogs in the NvidiaActiveSpeakerDetectionMediaFunction custom resource. For details, refer to Configuration Reference.
Model Cache#
Caches NGC model artifacts locally so models are not re-downloaded on every deployment. When the chart creates the PVC, its name is <appName>-model-cache. The container mounts the model cache at /opt/nim/.cache by default (mountPath).
nimModelCache:
enabled: true
create: true
size: "10Gi"
storageClassName: ""
mountPath: "/opt/nim/.cache"
When model cache is enabled, NIM_CACHE_PATH is set to the configured mountPath on the NIM pod.
NIM Log Files#
Persists time-stamped NIM log files under the configured directory. When the chart creates the PVC, its name is <appName>-nim-logs. The default mount path is /workspace/nim-logs (mountPath).
nimLogs:
enabled: true
create: true
size: "5Gi"
storageClassName: ""
mountPath: "/workspace/nim-logs"
Using a Pre-Existing Persistent Volume Claim#
To attach a PVC that already exists in the namespace instead of creating one with the chart, set create: false. Ensure a PVC named <appName>-model-cache or <appName>-nim-logs exists before deploying:
nimModelCache:
enabled: true
create: false
StorageClass#
By default, storageClassName is set to "", which uses the cluster’s default StorageClass. To use a specific StorageClass, set storageClassName to the name of an existing StorageClass in your cluster under the nimModelCache or nimLogs block.
If no default StorageClass is configured in your cluster and storageClassName is left empty, the PVC remains in Pending state. In that case, either set storageClassName to a valid class or pre-create the PVC and use create: false.
Note
The model cache PVC requires a StorageClass that supports ReadWriteOnce access mode. When using a shared filesystem, ensure only one pod writes to the cache concurrently.
Troubleshooting#
End-to-End Demo Chart and NIM Service Chart#
Symptom |
Likely Cause |
Fix |
|---|---|---|
|
Image pull secret missing or incorrect. |
|
Pod crash / NGC errors |
Model pull secret missing or invalid. |
|
Pod |
Node selector, GPU, or resource constraints. |
|
Pod |
Insufficient hugepages. |
|
No output |
Multicast IP addresses or ports misaligned. |
Ensure “sender → NIM service → receiver IP address and port” chain is consistent. |
Rivermax errors |
Rivermax license secret missing. |
|
Startup probe failures |
Model download slow or NGC key invalid. |
|
PVC |
No default StorageClass. |
Set |
Kubernetes Operator#
Symptom |
Likely Cause |
Fix |
|---|---|---|
|
Image pull secret missing or incorrect. |
Check |
Custom resource |
Invalid spec, missing secrets, or scheduling. |
|
NIM pod crash / NGC errors |
Model pull secret missing or invalid. |
|
Pod |
Node selector, GPU, or hugepages. |
|
Rivermax errors |
License secret missing. |
Confirm Rivermax license secret is mounted at |
CRD not found |
Operator chart not installed or failed. |
|
On Red Hat OpenShift, replace kubectl with oc.
See Also#
Configuration Reference — Full set of Helm values and operator custom resource fields.
Observability — Accessing pod logs and persistent log files.