End-to-End Demo Chart#
Overview#
The nvidia-active-speaker-detection-h4m-sample Helm chart deploys a complete Active Speaker Detection demo pipeline on Holoscan for Media. This includes:
Sender pipeline.
Active Speaker Detection NIM service (
nvidia-active-speaker-detection-h4m-service).Receiver pipeline (SMPTE ST 2110 or NMOS).
SRT output for preview.
The NIM analyzes live video and audio streams to identify the active speaker in a multi-person scene. Detection results are transmitted as bounding box metadata via SMPTE ST 2110-40 ancillary data.
Installation#
Prerequisites#
Complete all prerequisite steps (Rivermax license, image pull and model pull secrets, high-speed network attachment) before running helm install. For details, refer to Getting Started.
Pull the Chart#
If not already done, add the Helm repository and pull the chart:
helm pull nim-repo/nvidia-active-speaker-detection-h4m-sample --version 1.0.0
For the full repository setup, refer to Pull Helm Charts.
Helm Installation#
The chart includes a default values.yaml. For SMPTE ST 2110 static or NMOS pipelines, copy the configuration from Starter Configuration Files into values-st2110.yaml or values-nmos.yaml, adjust fields, and run the matching install command.
Important
The default values.yaml ships with example values for cluster-specific fields such as node selector (example-gpu-node), image pull secret, network name, and scheduler. You must update these values to match your cluster before deploying. The easiest approach is to use the global overrides so that each value is specified once for all components (sender, receiver, and NIM service). For details, refer to Global Overrides.
Minimal deployment using global overrides (NMOS mode, 1080p):
helm upgrade --install nvidia-asd-h4m-sample \
nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
--set global.nodeSelector.hostname=<gpu-node-name> \
--set global.image.secret=<image-pull-secret> \
--set global.network.name=<multus-net-attach-def> \
--set global.schedulerName=<scheduler-name> \
--set nvidia-active-speaker-detection-h4m-service.ngcModelDownload.secretName=<model-pull-secret>
Alternatively, set values per component when the sender, receiver, and NIM service run on different nodes or use different pull secrets:
helm upgrade --install nvidia-asd-h4m-sample \
nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
--set sender.nodeSelector.hostname=<sender-node> \
--set receiver.nodeSelector.hostname=<receiver-node> \
--set nvidia-active-speaker-detection-h4m-service.nodeSelector.hostname=<gpu-node> \
--set sender.image.secret=<image-pull-secret> \
--set receiver.image.secret=<image-pull-secret> \
--set nvidia-active-speaker-detection-h4m-service.image.secret=<image-pull-secret> \
--set nvidia-active-speaker-detection-h4m-service.ngcModelDownload.secretName=<model-pull-secret>
For all available Helm values, refer to the Configuration Reference.
Recommended: Two-Phase Deploy#
First, deploy the receiver and NIM service. When the pipeline is ready, enable the sender. This procedure prevents frames from accumulating in receiver queues before the NIM is ready.
# Phase 1: receiver + NIM service only
helm upgrade --install nvidia-asd-h4m-sample \
nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
--set sender.enabled=false
kubectl rollout status deployment/<receiver-appName> --timeout=180s
kubectl rollout status deployment/<nim-appName> --timeout=180s
# Phase 2: enable sender
helm upgrade --install nvidia-asd-h4m-sample \
nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz
kubectl rollout status deployment/<sender-appName> --timeout=180s
On Red Hat OpenShift, replace kubectl with oc.
Note
A rollout status timeout is not a failure; it means the pod did not become ready within the allotted time. The NIM pod might still be pulling its image or initializing. If a timeout occurs, check the pod state before taking action:
kubectl get pods -o wide
kubectl describe pod <pod-name>
Look for Pulling image in the events (normal—wait longer) or CrashLoopBackOff / ErrImagePull (actionable: check secrets and node resources).
After installation, verify all pods are running:
kubectl get pods
Confirm that the sender, NIM service, and receiver pods all show READY 1/1 before continuing.
Sender Sample Media#
Input Media Files#
The sender expects an MPEG Transport Stream (.ts) file containing video and audio streams for active speaker detection.
Stream Composition#
Each input .ts file must contain the following streams:
Video stream: Source video with one or more visible speakers.
Audio streams: One mono audio track per speaker (for example, two speakers require two audio tracks). Each audio stream should contain only a single speaker’s audio.
Codecs#
Video codec: H.264
Audio codec: Opus, 48 kHz, mono
Bundled Sample Files#
The sender container includes a sample .ts file:
/workspace/assets/sample_asset.ts
File |
Resolution |
Frame Rate |
Audio Tracks |
|---|---|---|---|
|
1080p (1920×1080) |
30 fps |
2 |
Use the same resolution and frame rate for the sender, NIM service (nvidia-active-speaker-detection-h4m-service), and receiver.
Note
Each input audio stream must contain only a single speaker’s audio. During silence, audio samples must be zero. If background noise is present, enable audio thresholding via useAudioThresholdToDetectActiveAudioStream and audioThresholdDb in the pipeline configuration. For details, refer to Configuration Reference.
Receiver Type#
The receiver supports two operating modes, controlled by receiver.receiverType:
Mode |
Description |
|---|---|
|
Receives bounding box metadata via SMPTE ST 2110-40 ancillary data. Use to consume detection results programmatically. |
|
Receives the output video with bounding box overlays re-streamed via SRT. Requires |
Example of switching the receiver to video mode:
receiver:
receiverType: "video"
nvidia-active-speaker-detection-h4m-service:
pipeline:
testFrameOverlayMode: true
Note
In ancillary mode, the ancillaryInputParams section is used. In video mode, the videoInputParams section is used instead.
Starter Configuration Files#
Copy the appropriate configuration into values-st2110.yaml or values-nmos.yaml, adjust fields for your environment, and pass it to helm upgrade --install with -f.
For a full reference of all Helm keys and pipeline tuning parameters, refer to the Configuration Reference.
SMPTE ST 2110 Configuration — Static (values-st2110.yaml)#
global:
nodeSelector:
hostname: ""
image:
secret: ""
network:
name: ""
schedulerName: ""
service:
type: ""
sender:
enabled: true
appName: nvidia-active-speaker-detection-sender-st2110
replicas: 1
nodeSelector:
hostname: example-gpu-node
image:
repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-sender
tag: "1.0.0"
secret: ngc-api-key
network:
name: "media-a-tx-net"
inputAssets:
videoFile: /workspace/assets/sample_asset.ts
numOfAudioTrack: 2
videoFramerateNum: 30
videoFramerateDen: 1
videoWidth: 1920
videoHeight: 1080
st2110Ports:
senderRemoteIp: 234.5.8.26
video: 7005
audio: 7006
nmos:
enabled: false
hostname: asd-sender.local
description: NVIDIA Active Speaker Detection Sender
label: Nvidia-ASD-Sender
service:
enabled: true
type: NodePort
port: 9010
nodePort: 32524
receiver:
enabled: true
appName: nvidia-active-speaker-detection-receiver-st2110
replicas: 1
nodeSelector:
hostname: example-gpu-node
image:
repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-receiver
tag: "1.0.0"
secret: ngc-api-key
network:
name: "media-a-tx-net"
receiverType: "ancillary"
nmos:
enabled: false
hostname: asd-receiver.local
description: NVIDIA Active Speaker Detection Receiver
label: Nvidia-ASD-Receiver
videoInputParams:
videoReceiverRemoteIp: "234.5.8.27"
videoReceiverPort: 7005
videoWidth: 1920
videoHeight: 1080
videoFramerateNum: 30
videoFramerateDen: 1
ancillaryInputParams:
hostIp: "234.5.8.27"
hostPort: 7006
framerateNum: 30
framerateDen: 1
srtPort:
internal: 8888
external: 30889
service:
enabled: true
type: NodePort
port: 9010
nodePort: 32525
nvidia-active-speaker-detection-h4m-service:
enabled: true
appName: nvidia-active-speaker-detection-nim-st2110
replicas: 1
nodeSelector:
hostname: example-gpu-node
image:
repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-nim
tag: "1.0.0"
secret: ngc-api-key
nimLogs:
enabled: true
mountPath: /workspace/nim-logs
network:
name: "media-a-tx-net"
nmos:
enabled: false
hostname: active-speaker-detection-nim.local
description: Nvidia Active Speaker Detection NIM
label: Nvidia-Active-Speaker-Detection-NIM
video:
width: 1920
height: 1080
framerateNum: 30
framerateDen: 1
audio:
pcmFormat: S24BE
samplingRate: 48000
numChannels: 1
input:
video:
sessionName: video_in
localInterfaceName: net1
hostIp: "234.5.8.26"
hostPort: "7005"
hostNumSubnetBits: 24
flushFrameCountThreshold: 3
audio:
sessionName: audio_in
localInterfaceName: net1
hostIp: 234.5.8.26
hostPort: "7006"
hostNumSubnetBits: 24
numStreams: 2
output:
ancillaryData:
sessionName: ancillary_out
localInterfaceName: net1
hostIp: 234.5.8.27
hostPort: "7006"
hostNumSubnetBits: 24
logging:
level: 3
pipeline:
testFrameOverlayMode: false
outputFrameBufferSize: 30
useAudioThresholdToDetectActiveAudioStream: false
audioThresholdDb: -40.0
syncTolerance: 0.5986
service:
enabled: true
type: NodePort
port: 9010
nodePort: 32526
NMOS Configuration (values-nmos.yaml)#
global:
nodeSelector:
hostname: ""
image:
secret: ""
network:
name: ""
schedulerName: ""
service:
type: ""
sender:
enabled: true
appName: nvidia-active-speaker-detection-sender-nmos
replicas: 1
nodeSelector:
hostname: example-gpu-node
image:
repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-sender
tag: "1.0.0"
secret: ngc-api-key
network:
name: "media-a-tx-net"
inputAssets:
videoFile: /workspace/assets/sample_asset.ts
numOfAudioTrack: 2
videoFramerateNum: 30
videoFramerateDen: 1
videoWidth: 1920
videoHeight: 1080
nmos:
enabled: true
hostname: asd-sender.local
description: Nvidia Active Speaker Detection Sender
label: Nvidia-ASD-Sender
service:
enabled: true
type: NodePort
port: 9010
nodePort: 32524
receiver:
enabled: true
appName: nvidia-active-speaker-detection-receiver-nmos
replicas: 1
nodeSelector:
hostname: example-gpu-node
image:
repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-receiver
tag: "1.0.0"
secret: ngc-api-key
receiverType: "ancillary" # or "video"
videoInputParams:
videoFramerateNum: 30
videoFramerateDen: 1
videoWidth: 1920
videoHeight: 1080
nmos:
enabled: true
hostname: asd-receiver.local
description: Nvidia Active Speaker Detection Receiver
label: Nvidia-ASD-Receiver
ancillaryInputParams:
hostIp: "234.5.8.27"
hostPort: 7006
framerateNum: 30
framerateDen: 1
videoInputParams:
videoReceiverRemoteIp: "234.5.8.26"
videoReceiverPort: 7005
videoWidth: 1920
videoHeight: 1080
videoFramerateNum: 30
videoFramerateDen: 1
srtPort:
internal: 8888
external: 30889
service:
enabled: true
type: NodePort
port: 9010
nodePort: 32525
nvidia-active-speaker-detection-h4m-service:
enabled: true
appName: nvidia-active-speaker-detection-feature-nmos
replicas: 1
nodeSelector:
hostname: example-gpu-node
image:
repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-nim
tag: "1.0.0"
secret: ngc-api-key
nimLogs:
enabled: true
mountPath: /workspace/nim-logs
network:
name: "media-a-tx-net"
nmos:
enabled: true
hostname: active-speaker-detection-nim.local
description: Nvidia Active Speaker Detection NIM
label: Nvidia-Active-Speaker-Detection-NIM
video:
width: 1920
height: 1080
framerateNum: 30
framerateDen: 1
audio:
pcmFormat: S24BE
samplingRate: 48000
numChannels: 1
input:
video:
sessionName: video_in
flushFrameCountThreshold: 3
audio:
sessionName: audio_in
numStreams: 2
output:
video:
sessionName: video_out
ancillaryData:
sessionName: ancillary_out
localInterfaceName: net1
hostIp: 234.5.8.27
hostPort: "7006"
hostNumSubnetBits: 24
logging:
level: 3
pipeline:
testFrameOverlayMode: false
outputFrameBufferSize: 30
useAudioThresholdToDetectActiveAudioStream: false
audioThresholdDb: -40.0
syncTolerance: 0.5986
service:
enabled: true
type: NodePort
port: 9010
nodePort: 32526
On Holoscan for Media clusters, the NMOS Connection Manager web UI is typically reached from a browser on the cluster network. For remote graphical access, refer to Chrome Remote Desktop in Getting Started.
Note
In NMOS mode, connect receivers to the NIM service before connecting the sender via the NMOS Connection Manager UI. In SMPTE ST 2110 mode (static), ensure that the “sender → NIM service → receiver IP address and port” chain is consistent across all values.
Install#
SMPTE ST 2110 (static):
helm upgrade --install nvidia-asd-h4m-sample \
nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
-f values-st2110.yaml
Note
NMOS is enabled by default. To deploy in fixed SMPTE ST 2110 mode (static), append the following:
--set sender.nmos.enabled=false \
--set receiver.nmos.enabled=false \
--set nvidia-active-speaker-detection-h4m-service.nmos.enabled=false
NMOS:
```bash
helm upgrade --install nvidia-asd-h4m-sample \
nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
-f values-nmos.yaml
View Receiver Output#
The receiver logs the parsed ancillary payload as JSON. To view it, run the following command:
oc logs deploy/<receiver-deployment>
A typical log line looks like the following:
{
"detections": [
{
"audio_id": 1,
"bbox": {"h": 259.10, "w": 193.48, "x": 1364.90, "y": 381.73},
"det_idx": 0,
"is_speaking": true,
"score": 0.9589,
"track_id": 0
},
{
"audio_id": 0,
"bbox": {"h": 170.55, "w": 135.42, "x": 391.97, "y": 350.03},
"det_idx": 1,
"is_speaking": false,
"score": 0.9071,
"track_id": 1
}
],
"frame_idx": 560,
"rtp_timestamp": 75101117997,
"timestamp": "07:25:35.585747"
}
Field |
Description |
|---|---|
|
Array of detected faces in the frame. |
|
Index of the diarized audio stream associated with this face ( |
|
Bounding box in pixels: |
|
Detection index within the frame. |
|
|
|
Detection confidence score (0–1). |
|
Persistent face track identifier across frames. |
|
Zero-based frame counter. |
|
RTP timestamp of the corresponding input video frame. |
|
Wall-clock time of the frame ( |
View Output via SRT#
Get the internal IP of the node:
kubectl get nodes -o wide
The SRT output is on port 30889 by default. To view the stream in VLC, refer to Open an SRT Stream in VLC.
Disabling Components#
Individual pipeline components can be disabled during helm install or helm upgrade without removing the release:
--set sender.enabled=false
--set receiver.enabled=false
--set nvidia-active-speaker-detection-h4m-service.enabled=false
Only the selected component stops; the rest of the pipeline continues to run.
Verify#
helm status nvidia-asd-h4m-sample
kubectl get pods -o wide
Verify configured Helm values:
helm get values nvidia-asd-h4m-sample
On Red Hat OpenShift, replace kubectl with oc. For log access, refer to Observability.
Uninstall#
helm uninstall nvidia-asd-h4m-sample
End-to-End Verification#
Create
values-st2110.yamlorvalues-nmos.yamlfrom Starter Configuration Files and run the matching Install command.Confirm all pods show
READY 1/1withkubectl get pods.Open the SRT preview; refer to View Output via SRT. For VLC steps, NMOS wiring, and screenshots, refer to Verification.
Run
helm uninstall nvidia-asd-h4m-sample.
For troubleshooting, refer to Advanced Usage.