nemo_curator.stages.audio.inference.speaker_diarization.pyannote
nemo_curator.stages.audio.inference.speaker_diarization.pyannote
nemo_curator.stages.audio.inference.speaker_diarization.pyannote
PyAnnote Diarization and Overlap Detection Stage.
Bases: ProcessingStage[AudioTask, AudioTask]
Stage that performs speaker diarization and overlap detection using PyAnnote.
Identifies different speakers and detects overlapping speech segments.
Parameters:
HuggingFace authentication token
Batch size for segmentation
Batch size for speaker embeddings
Minimum segment length in seconds
Maximum segment length in seconds
If set, passes num_workers to Xenna (cluster-wide cap). Unset uses Xenna autoscaling.
Derive device from resources configuration.
Add VAD segments for a given audio region to the segments list.
Process a single entry for diarization and overlap detection.
Load models to device (called per replica before processing).
Download model weights (called once per node).
Check if a given turn overlaps with any segment in the overlaps list.
Parameters:
A segment representing a speech turn
List of overlap segments, sorted by start time
Returns: bool
True if the turn overlaps with any segment, False otherwise