nemo_curator.stages.audio.segmentation.speaker_separation
nemo_curator.stages.audio.segmentation.speaker_separation
Speaker separation stage using NeMo SortFormer diarization model.
Performs speaker diarization and separates audio by speaker, creating separate AudioTask outputs for each speaker.
Module Contents
Classes
Functions
API
Bases: ProcessingStage[AudioTask, AudioTask]
Speaker separation stage using NeMo SortFormer diarization model.
Separates audio by speaker and creates separate AudioTask outputs for each speaker’s segments. Downloads the NeMo model from HuggingFace Hub (nvidia/diar_sortformer_4spk-v1).
Parameters:
HuggingFace model ID or path to NeMo diarization model
Whether to exclude overlapping speaker regions
Minimum segment duration in seconds
Gap threshold for merging speaker segments
Buffer time around speaker segments
Build AudioTask list from speaker audio data.
Separate audio by speaker.
Returns: list[AudioTask]
List of AudioTask objects, one per speaker.
Convert PyDub AudioSegment to (waveform, sample_rate). Output is canonical format only.