Advanced Usage#

Persistent Storage#

The NIM service chart supports two optional persistent volume claims for storage that survives pod restarts and helm uninstall. Both are disabled by default and are annotated with helm.sh/resource-policy: keep so they are retained when the Helm release is removed.

For the operator, equivalent settings are available under spec.parameters.nimModelCache and spec.parameters.serverLogs on the NvidiaLipsyncMediaFunction custom resource. For details, refer to Operator Configuration.

Model Cache#

Caches NGC model artifacts locally to avoid re-downloading on every deployment. The persistent volume claim name is <appName>-model-cache.

nimModelCache:
  enabled: true
  size: "10Gi"
  storageClassName: ""

The chart mounts the model cache at /opt/nim/.cache. The LipSync operator sets NIM_CACHE_PATH to that path on the NIM pod when model cache is enabled on the custom resource.

Server Log Files#

Persists time-stamped NIM server log files under /var/log/lipsync. The persistent volume claim name is <appName>-server-logs.

serverLogs:
  enabled: true
  size: "5Gi"
  storageClassName: ""

When serverLogs is enabled, the chart sets AI4M_NIM_LOG_PATH to /var/log/lipsync.

StorageClass#

By default, storageClassName is set to "", which uses the cluster’s default StorageClass. To use a specific StorageClass, set storageClassName to the name of an existing StorageClass in your cluster under the nimModelCache or serverLogs block.

If no default StorageClass is configured in your cluster and storageClassName is left empty, the PVC remains in Pending state. In that case, set storageClassName to a valid StorageClass from your cluster.

List StorageClasses in your cluster:

kubectl get storageclass

If a chart-managed PVC stays Pending, uncomment storageClassName in your values file (refer to the Model Cache and Server Log Files examples) and set it to an RWO-capable class from that list. For example:

nimModelCache:
  enabled: true
  size: "10Gi"
  storageClassName: <storage-class>

Note

The model cache persistent volume claim requires a StorageClass that supports ReadWriteOnce access mode. When using a shared filesystem, ensure that only one pod writes to the cache concurrently.

LipSync NIM Integration with Active Speaker Detection for Multi-Speaker Use Cases#

Multi-speaker lip sync requires integration with the Active Speaker Detection (ASD) NIM. The ASD NIM identifies active speakers in the video stream and provides per-frame bounding boxes, speaker IDs, and confidence scores. The LipSync NIM uses this ancillary data to apply lip sync to the active speaker.

NMOS integration currently has known issues, and multi-speaker integration is supported only in ST 2110 static mode. Refer to NMOS Multi-Speaker Limitation.

Enabling Ancillary Input for ASD Bounding Boxes#

To receive bounding box ancillary data from the ASD NIM, enable ancillary input in the LipSync NIM configuration as shown. (The hostIp and multicast settings must match your deployment configuration.)

input:
  ancillaryData:
    enabled: true
    hostIp: "234.5.8.9"
    hostPort: "8001"

Debug Visualization#

The output.boundingBoxEnabled flag is optional and intended for debugging and visualization only:

output:
  boundingBoxEnabled: true

When enabled:

  • A green bounding box is drawn around the currently active speaker.

  • A blue bounding box is drawn for a detected speaker who is inactive or not speaking.

This visualization helps verify that ASD metadata is being received and interpreted correctly by the LipSync NIM.

ASD NIM Configuration#

Ensure that the ASD NIM ancillary output IP and port match the LipSync NIM ancillary input configuration. For example, the default LipSync Helm configuration uses the following values:

  • IP address: 234.5.8.9

  • Port: 8001

For the binary layout of the ancillary application payload, refer to Ancillary Data Payload (SMPTE ST 2110-40).

Ancillary Data Payload (SMPTE ST 2110-40)#

When ancillary input is enabled, the LipSync NIM ingests per-frame active speaker tracking bounding boxes carried over the SMPTE ST 2110-40 ancillary data channel.

The application payload follows a fixed binary layout with size of 312 bytes, supporting up to 12 bounding boxes per frame.

Binary Layout#

Offset (Bytes)

Field

Type

Size (Bytes)

Description

0

bounding_boxes[0]

BBox

16

Bounding box for speaker 0.

16

bounding_boxes[1]

BBox

16

Bounding box for speaker 1.

176

bounding_boxes[11]

BBox

16

Bounding box for speaker 11.

Subtotal: BBox[12]

192

192

tracking_id[0..11]

uint16[12]

24

Per-speaker tracking IDs.

216

audio_id[0..11]

int16[12]

24

Associated audio stream IDs.

240

confidence[0..11]

float[12]

48

Detection confidence per speaker.

288

is_speaking[0..11]

uint8[12]

12

Speaking activity flag (per speaker).

300

num_bboxes

uint8

1

Number of valid bounding boxes.

301–303

(padding)

3

Alignment padding.

304

seq_no

uint64

8

Monotonic sequence number.

Total (typical)

312

Using a Custom Input Asset#

To provide your own .ts transport stream without baking it into the sender image, use a PersistentVolumeClaim, copy assets into it, and then enable the optional sender.assetsPVC settings in the nvidia-lipsync-h4m-sample Helm chart and set sender.video.inputFile to a path under that mount (for example, /mnt/sender-assets/...).

On Red Hat OpenShift, use oc instead of kubectl in the following steps.

Step 1: Create a PVC#

This example uses your cluster’s default StorageClass. If the PVC stays Pending, set storageClassName to a storage class recommended for your cluster (for example, <storage-class>). You can list available classes with the following command:

kubectl get storageclass
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sender-assets
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  # If this PVC stays Pending, uncomment the next line and replace <storage-class> with an RWO-capable StorageClass from your cluster.
  # storageClassName: <storage-class>
EOF

Step 2: Upload the Asset to the PVC#

Launch a temporary pod that mounts the PVC, copy files, and then delete the pod:

# Start a temporary pod. (Add nodeSelector under spec if you must bind the volume on a specific node.)
kubectl run asset-loader --image=busybox:1.36 --restart=Never \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "loader",
        "image": "busybox:1.36",
        "command": ["sleep", "3600"],
        "volumeMounts": [{"name": "assets", "mountPath": "/mnt/assets"}]
      }],
      "volumes": [{
        "name": "assets",
        "persistentVolumeClaim": {"claimName": "sender-assets"}
      }]
    }
  }'

kubectl wait --for=condition=Ready pod/asset-loader --timeout=120s

# Copy one or more .ts files (example: single file)
kubectl cp my-custom-video.ts asset-loader:/mnt/assets/

# Or copy a whole directory of .ts files
# kubectl cp ./my-ts-dir/. asset-loader:/mnt/assets/

kubectl exec asset-loader -- ls -lh /mnt/assets/

kubectl delete pod asset-loader

Step 3: Point Helm at the Mounted Path#

Enable the claim on the sender and set inputFile under mountPath. (Defaults are in values*.yaml for the sample chart.)

sender:
  assetsPVC:
    enabled: true
    claimName: sender-assets
    mountPath: "/mnt/sender-assets"
    readOnly: true
  video:
    inputFile: /mnt/sender-assets/my-custom-video.ts

Upgrade or install the release; then confirm that the sender process sees the file:

kubectl logs deploy/<sender-deployment-name> | head -100

Use the sender deployment name from sender.appName in your values file (for example nvidia-lipsync-sender-nmos or nvidia-lipsync-sender-st2110).

You should see a line such as INPUT_FILE=/mnt/sender-assets/... matching your sender.video.inputFile. (The sender script logs this at startup.)

Troubleshooting#

End-to-End Demo Chart and NIM Service Chart#

Symptom

Likely Cause

Fix

ImagePullBackOff

Image pull secret missing or incorrect.

Check image.secret in Helm values; kubectl get secret <image.secret>.

Pod crash / NGC errors

Model pull secret missing or invalid.

Confirm ngcApiKeySecret.name and key ngcApiKeySecret.key (default: NGC_API_KEY); kubectl get secret <ngcApiKeySecret.name>.

Pod Pending

Node selector, GPU, or resource constraints.

Check node labels and capacity; kubectl describe pod <pod>. If events mention NUMA or topology, refer to Scheduler Requirements for NUMA-aware Clusters.

Pod Pending

Insufficient hugepages.

Check hugepages availability; kubectl describe node <node>.

No output

Multicast IP addresses or ports misaligned.

Ensure that the sender, NIM service, and receiver IP addresses and ports are consistent.

Rivermax errors

Rivermax license secret missing.

Confirm that the Rivermax license secret exists; kubectl get secret rivermax-license.

Startup probe failures

Model download slow or NGC key invalid.

Review NIM logs; kubectl logs deploy/<appName>. If needed, increase startup probe failureThreshold.

PVC Pending

No default StorageClass, or the default class does not provision RWO volumes for this claim (slow or unsuitable provisioner).

For chart-managed claims, set nimModelCache.storageClassName and serverLogs.storageClassName, or create a matching StorageClass. For the custom sender-assets PVC in Using a Custom Input Asset, set spec.storageClassName to an RWO-capable class from kubectl get storageclass.

Kubernetes Operator#

Symptom

Likely Cause

Fix

ImagePullBackOff on controller or NIM pod

Image pull secret missing or incorrect.

Check imagePullSecrets / mediaFunction.imagePullSecrets; kubectl describe pod <pod>.

Custom resource Provisioned false

Invalid spec, missing secrets, or scheduling.

Check CR status and events; kubectl describe nvidialipsyncmediafunction <name> -n <namespace>. Review operator controller logs.

NIM pod crash / NGC errors

Model pull secret missing or invalid.

Confirm spec.parameters.ngcApiKeySecret name and key; kubectl get secret.

Pod Pending

Node selector, GPU, or hugepages.

Verify node labels and capacity; kubectl describe pod <pod>. If events mention NUMA or topology, refer to Scheduler Requirements for NUMA-aware Clusters.

Rivermax errors

License secret missing.

Confirm that the Rivermax license secret is mounted at /opt/mellanox/rivermax.

CRD not found

Operator chart not installed or failed.

Run helm status <operator-release-name>. Reinstall or upgrade the operator chart.

On Red Hat OpenShift, replace kubectl with oc.

See Also#