nemo_curator.stages.audio.inference.vad.whisperx_vad
nemo_curator.stages.audio.inference.vad.whisperx_vad
WhisperX VAD for NeMo Curator.
Provides WhisperXVADModel (shared VAD logic for pyannote and standalone VAD) and WhisperXVADStage (ProcessingStage for VAD-only pipeline use).
Module Contents
Classes
API
Shared VAD model and get_vad_segments logic for PyAnnote and standalone VAD.
Used by PyAnnoteDiarizationStage for sub-segment VAD and by WhisperXVADStage for VAD-only processing.
Get voice activity detection segments for the given audio.
Parameters:
NumPy array of shape (C, N).
Maximum length for merging chunks in seconds.
Sample rate of the audio.
Returns: list[dict]
List of VAD segment dicts with “start” and “end” keys.
Move the model to the given device.
Bases: ProcessingStage[AudioTask, AudioTask]
Stage that performs Voice Activity Detection (VAD) using WhisperX’s VAD model.
Adds VAD segments to each entry under segments_key (e.g. “vad_segments”). Entries shorter than min_length are skipped (not emitted).
Derive device from resources configuration.
Setup stage on node.