End-to-End Demo Chart#

Overview#

The nvidia-active-speaker-detection-h4m-sample Helm chart deploys a complete Active Speaker Detection demo pipeline on Holoscan for Media. This includes:

  • Sender pipeline.

  • Active Speaker Detection NIM service (nvidia-active-speaker-detection-h4m-service).

  • Receiver pipeline (SMPTE ST 2110 or NMOS).

  • SRT output for preview.

The NIM analyzes live video and audio streams to identify the active speaker in a multi-person scene. Detection results are transmitted as bounding box metadata via SMPTE ST 2110-40 ancillary data.

Installation#

Prerequisites#

Complete all prerequisite steps (Rivermax license, image pull and model pull secrets, high-speed network attachment) before running helm install. For details, refer to Getting Started.

Pull the Chart#

If not already done, add the Helm repository and pull the chart:

helm pull nim-repo/nvidia-active-speaker-detection-h4m-sample --version 1.0.0

For the full repository setup, refer to Pull Helm Charts.

Helm Installation#

The chart includes a default values.yaml. For SMPTE ST 2110 static or NMOS pipelines, copy the configuration from Starter Configuration Files into values-st2110.yaml or values-nmos.yaml, adjust fields, and run the matching install command.

Important

The default values.yaml ships with example values for cluster-specific fields such as node selector (example-gpu-node), image pull secret, network name, and scheduler. You must update these values to match your cluster before deploying. The easiest approach is to use the global overrides so that each value is specified once for all components (sender, receiver, and NIM service). For details, refer to Global Overrides.

Minimal deployment using global overrides (NMOS mode, 1080p):

helm upgrade --install nvidia-asd-h4m-sample \
  nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
  --set global.nodeSelector.hostname=<gpu-node-name> \
  --set global.image.secret=<image-pull-secret> \
  --set global.network.name=<multus-net-attach-def> \
  --set global.schedulerName=<scheduler-name> \
  --set nvidia-active-speaker-detection-h4m-service.ngcModelDownload.secretName=<model-pull-secret>

Alternatively, set values per component when the sender, receiver, and NIM service run on different nodes or use different pull secrets:

helm upgrade --install nvidia-asd-h4m-sample \
  nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
  --set sender.nodeSelector.hostname=<sender-node> \
  --set receiver.nodeSelector.hostname=<receiver-node> \
  --set nvidia-active-speaker-detection-h4m-service.nodeSelector.hostname=<gpu-node> \
  --set sender.image.secret=<image-pull-secret> \
  --set receiver.image.secret=<image-pull-secret> \
  --set nvidia-active-speaker-detection-h4m-service.image.secret=<image-pull-secret> \
  --set nvidia-active-speaker-detection-h4m-service.ngcModelDownload.secretName=<model-pull-secret>

For all available Helm values, refer to the Configuration Reference.


Sender Sample Media#

Input Media Files#

The sender expects an MPEG Transport Stream (.ts) file containing video and audio streams for active speaker detection.

Stream Composition#

Each input .ts file must contain the following streams:

  • Video stream: Source video with one or more visible speakers.

  • Audio streams: One mono audio track per speaker (for example, two speakers require two audio tracks). Each audio stream should contain only a single speaker’s audio.

Codecs#

  • Video codec: H.264

  • Audio codec: Opus, 48 kHz, mono

Bundled Sample Files#

The sender container includes a sample .ts file:

/workspace/assets/sample_asset.ts

File

Resolution

Frame Rate

Audio Tracks

sample_asset.ts

1080p (1920×1080)

30 fps

2

Use the same resolution and frame rate for the sender, NIM service (nvidia-active-speaker-detection-h4m-service), and receiver.

Note

Each input audio stream must contain only a single speaker’s audio. During silence, audio samples must be zero. If background noise is present, enable audio thresholding via useAudioThresholdToDetectActiveAudioStream and audioThresholdDb in the pipeline configuration. For details, refer to Configuration Reference.


Receiver Type#

The receiver supports two operating modes, controlled by receiver.receiverType:

Mode

Description

ancillary

Receives bounding box metadata via SMPTE ST 2110-40 ancillary data. Use to consume detection results programmatically.

video

Receives the output video with bounding box overlays re-streamed via SRT. Requires testFrameOverlayMode: true on the NIM service.

Example of switching the receiver to video mode:

receiver:
  receiverType: "video"

nvidia-active-speaker-detection-h4m-service:
  pipeline:
    testFrameOverlayMode: true

Note

In ancillary mode, the ancillaryInputParams section is used. In video mode, the videoInputParams section is used instead.


Starter Configuration Files#

Copy the appropriate configuration into values-st2110.yaml or values-nmos.yaml, adjust fields for your environment, and pass it to helm upgrade --install with -f.

For a full reference of all Helm keys and pipeline tuning parameters, refer to the Configuration Reference.

SMPTE ST 2110 Configuration — Static (values-st2110.yaml)#

global:
  nodeSelector:
    hostname: ""
  image:
    secret: ""
  network:
    name: ""
  schedulerName: ""
  service:
    type: ""

sender:
  enabled: true
  appName: nvidia-active-speaker-detection-sender-st2110
  replicas: 1
  nodeSelector:
    hostname: example-gpu-node
  image:
    repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-sender
    tag: "1.0.0"
    secret: ngc-api-key
  network:
    name: "media-a-tx-net"
  inputAssets:
    videoFile: /workspace/assets/sample_asset.ts
    numOfAudioTrack: 2
    videoFramerateNum: 30
    videoFramerateDen: 1
    videoWidth: 1920
    videoHeight: 1080
  st2110Ports:
    senderRemoteIp: 234.5.8.26
    video: 7005
    audio: 7006
  nmos:
    enabled: false
    hostname: asd-sender.local
    description: NVIDIA Active Speaker Detection Sender
    label: Nvidia-ASD-Sender
  service:
    enabled: true
    type: NodePort
    port: 9010
    nodePort: 32524

receiver:
  enabled: true
  appName: nvidia-active-speaker-detection-receiver-st2110
  replicas: 1
  nodeSelector:
    hostname: example-gpu-node
  image:
    repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-receiver
    tag: "1.0.0"
    secret: ngc-api-key
  network:
    name: "media-a-tx-net"
  receiverType: "ancillary"
  nmos:
    enabled: false
    hostname: asd-receiver.local
    description: NVIDIA Active Speaker Detection Receiver
    label: Nvidia-ASD-Receiver
  videoInputParams:
    videoReceiverRemoteIp: "234.5.8.27"
    videoReceiverPort: 7005
    videoWidth: 1920
    videoHeight: 1080
    videoFramerateNum: 30
    videoFramerateDen: 1
  ancillaryInputParams:
    hostIp: "234.5.8.27"
    hostPort: 7006
    framerateNum: 30
    framerateDen: 1
  srtPort:
    internal: 8888
    external: 30889
  service:
    enabled: true
    type: NodePort
    port: 9010
    nodePort: 32525

nvidia-active-speaker-detection-h4m-service:
  enabled: true
  appName: nvidia-active-speaker-detection-nim-st2110
  replicas: 1
  nodeSelector:
    hostname: example-gpu-node
  image:
    repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-nim
    tag: "1.0.0"
    secret: ngc-api-key
  nimLogs:
    enabled: true
    mountPath: /workspace/nim-logs
  network:
    name: "media-a-tx-net"
  nmos:
    enabled: false
    hostname: active-speaker-detection-nim.local
    description: Nvidia Active Speaker Detection NIM
    label: Nvidia-Active-Speaker-Detection-NIM
  video:
    width: 1920
    height: 1080
    framerateNum: 30
    framerateDen: 1
  audio:
    pcmFormat: S24BE
    samplingRate: 48000
    numChannels: 1
  input:
    video:
      sessionName: video_in
      localInterfaceName: net1
      hostIp: "234.5.8.26"
      hostPort: "7005"
      hostNumSubnetBits: 24
      flushFrameCountThreshold: 3
    audio:
      sessionName: audio_in
      localInterfaceName: net1
      hostIp: 234.5.8.26
      hostPort: "7006"
      hostNumSubnetBits: 24
      numStreams: 2
  output:
    ancillaryData:
      sessionName: ancillary_out
      localInterfaceName: net1
      hostIp: 234.5.8.27
      hostPort: "7006"
      hostNumSubnetBits: 24
  logging:
    level: 3
  pipeline:
    testFrameOverlayMode: false
    outputFrameBufferSize: 30
    useAudioThresholdToDetectActiveAudioStream: false
    audioThresholdDb: -40.0
    syncTolerance: 0.5986
  service:
    enabled: true
    type: NodePort
    port: 9010
    nodePort: 32526

NMOS Configuration (values-nmos.yaml)#

global:
  nodeSelector:
    hostname: ""
  image:
    secret: ""
  network:
    name: ""
  schedulerName: ""
  service:
    type: ""

sender:
  enabled: true
  appName: nvidia-active-speaker-detection-sender-nmos
  replicas: 1
  nodeSelector:
    hostname: example-gpu-node
  image:
    repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-sender
    tag: "1.0.0"
    secret: ngc-api-key
  network:
    name: "media-a-tx-net"
  inputAssets:
    videoFile: /workspace/assets/sample_asset.ts
    numOfAudioTrack: 2
    videoFramerateNum: 30
    videoFramerateDen: 1
    videoWidth: 1920
    videoHeight: 1080
  nmos:
    enabled: true
    hostname: asd-sender.local
    description: Nvidia Active Speaker Detection Sender
    label: Nvidia-ASD-Sender
  service:
    enabled: true
    type: NodePort
    port: 9010
    nodePort: 32524

receiver:
  enabled: true
  appName: nvidia-active-speaker-detection-receiver-nmos
  replicas: 1
  nodeSelector:
    hostname: example-gpu-node
  image:
    repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-sample-receiver
    tag: "1.0.0"
    secret: ngc-api-key
  receiverType: "ancillary" # or "video"
  videoInputParams:
    videoFramerateNum: 30
    videoFramerateDen: 1
    videoWidth: 1920
    videoHeight: 1080
  nmos:
    enabled: true
    hostname: asd-receiver.local
    description: Nvidia Active Speaker Detection Receiver
    label: Nvidia-ASD-Receiver
  ancillaryInputParams:
    hostIp: "234.5.8.27"
    hostPort: 7006
    framerateNum: 30
    framerateDen: 1
  videoInputParams:
    videoReceiverRemoteIp: "234.5.8.26"
    videoReceiverPort: 7005
    videoWidth: 1920
    videoHeight: 1080
    videoFramerateNum: 30
    videoFramerateDen: 1
  srtPort:
    internal: 8888
    external: 30889
  service:
    enabled: true
    type: NodePort
    port: 9010
    nodePort: 32525

nvidia-active-speaker-detection-h4m-service:
  enabled: true
  appName: nvidia-active-speaker-detection-feature-nmos
  replicas: 1
  nodeSelector:
    hostname: example-gpu-node
  image:
    repository: nvcr.io/nim/nvidia/active-speaker-detection-h4m-nim
    tag: "1.0.0"
    secret: ngc-api-key
  nimLogs:
    enabled: true
    mountPath: /workspace/nim-logs
  network:
    name: "media-a-tx-net"
  nmos:
    enabled: true
    hostname: active-speaker-detection-nim.local
    description: Nvidia Active Speaker Detection NIM
    label: Nvidia-Active-Speaker-Detection-NIM
  video:
    width: 1920
    height: 1080
    framerateNum: 30
    framerateDen: 1
  audio:
    pcmFormat: S24BE
    samplingRate: 48000
    numChannels: 1
  input:
    video:
      sessionName: video_in
      flushFrameCountThreshold: 3
    audio:
      sessionName: audio_in
      numStreams: 2
  output:
    video:
      sessionName: video_out
    ancillaryData:
      sessionName: ancillary_out
      localInterfaceName: net1
      hostIp: 234.5.8.27
      hostPort: "7006"
      hostNumSubnetBits: 24
  logging:
    level: 3
  pipeline:
    testFrameOverlayMode: false
    outputFrameBufferSize: 30
    useAudioThresholdToDetectActiveAudioStream: false
    audioThresholdDb: -40.0
    syncTolerance: 0.5986
  service:
    enabled: true
    type: NodePort
    port: 9010
    nodePort: 32526

On Holoscan for Media clusters, the NMOS Connection Manager web UI is typically reached from a browser on the cluster network. For remote graphical access, refer to Chrome Remote Desktop in Getting Started.

Note

In NMOS mode, connect receivers to the NIM service before connecting the sender via the NMOS Connection Manager UI. In SMPTE ST 2110 mode (static), ensure that the “sender → NIM service → receiver IP address and port” chain is consistent across all values.

Install#

SMPTE ST 2110 (static):

helm upgrade --install nvidia-asd-h4m-sample \
  nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
  -f values-st2110.yaml

Note

NMOS is enabled by default. To deploy in fixed SMPTE ST 2110 mode (static), append the following:

--set sender.nmos.enabled=false \
--set receiver.nmos.enabled=false \
--set nvidia-active-speaker-detection-h4m-service.nmos.enabled=false
NMOS:

```bash
helm upgrade --install nvidia-asd-h4m-sample \
  nvidia-active-speaker-detection-h4m-sample-1.0.0.tgz \
  -f values-nmos.yaml

View Receiver Output#

The receiver logs the parsed ancillary payload as JSON. To view it, run the following command:

oc logs deploy/<receiver-deployment>

A typical log line looks like the following:

{
  "detections": [
    {
      "audio_id": 1,
      "bbox": {"h": 259.10, "w": 193.48, "x": 1364.90, "y": 381.73},
      "det_idx": 0,
      "is_speaking": true,
      "score": 0.9589,
      "track_id": 0
    },
    {
      "audio_id": 0,
      "bbox": {"h": 170.55, "w": 135.42, "x": 391.97, "y": 350.03},
      "det_idx": 1,
      "is_speaking": false,
      "score": 0.9071,
      "track_id": 1
    }
  ],
  "frame_idx": 560,
  "rtp_timestamp": 75101117997,
  "timestamp": "07:25:35.585747"
}

Field

Description

detections

Array of detected faces in the frame.

audio_id

Index of the diarized audio stream associated with this face (-1 if unassigned).

bbox

Bounding box in pixels: x, y (top-left corner), w (width), h (height).

det_idx

Detection index within the frame.

is_speaking

true if this face is the active speaker.

score

Detection confidence score (0–1).

track_id

Persistent face track identifier across frames.

frame_idx

Zero-based frame counter.

rtp_timestamp

RTP timestamp of the corresponding input video frame.

timestamp

Wall-clock time of the frame (HH:MM:SS.ffffff).


View Output via SRT#

Get the internal IP of the node:

kubectl get nodes -o wide

The SRT output is on port 30889 by default. To view the stream in VLC, refer to Open an SRT Stream in VLC.


Disabling Components#

Individual pipeline components can be disabled during helm install or helm upgrade without removing the release:

--set sender.enabled=false
--set receiver.enabled=false
--set nvidia-active-speaker-detection-h4m-service.enabled=false

Only the selected component stops; the rest of the pipeline continues to run.


Verify#

helm status nvidia-asd-h4m-sample
kubectl get pods -o wide

Verify configured Helm values:

helm get values nvidia-asd-h4m-sample

On Red Hat OpenShift, replace kubectl with oc. For log access, refer to Observability.


Uninstall#

helm uninstall nvidia-asd-h4m-sample

End-to-End Verification#

  1. Create values-st2110.yaml or values-nmos.yaml from Starter Configuration Files and run the matching Install command.

  2. Confirm all pods show READY 1/1 with kubectl get pods.

  3. Open the SRT preview; refer to View Output via SRT. For VLC steps, NMOS wiring, and screenshots, refer to Verification.

  4. Run helm uninstall nvidia-asd-h4m-sample.

For troubleshooting, refer to Advanced Usage.