nemo_curator.stages.audio.segmentation.speaker_separation_module.speaker_sep
nemo_curator.stages.audio.segmentation.speaker_separation_module.speaker_sep
Module Contents
Classes
Functions
API
Bases: NamedTuple
Result for a single speaker from get_speaker_audio_data.
Class for separating speakers in an audio file using diarization.
Helper to get a parameter from config, handling different config structures.
Load the diarization model from Hugging Face Hub.
Handle overlaps by cutting segments at overlap points.
Run speaker diarization on an audio file or waveform.
Completely exclude any segments where multiple speakers are talking simultaneously.
Filter out segments that are shorter than the minimum duration.
Process an audio file or waveform and return AudioSegment objects for each speaker.
Parse predicted segments and organize by speaker.
Merge adjacent segments for the same speaker if they are close enough.
Process an audio file or waveform to get speaker segments.
Load audio file using soundfile.
Uses soundfile directly to avoid torchaudio/torchcodec/FFmpeg dependency issues.
Parameters:
Path to the audio file
Returns: tuple[torch.Tensor, int]
Tuple of (waveform tensor, sample_rate)