NeMo Speaker Diarization API#

Model Classes#

Mixins#

class nemo.collections.asr.parts.mixins.DiarizationMixin#

Bases: VerificationMixin

abstract diarize( paths2audio_files: List[str], batch_size: int = 1, ) → List[str]#

Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed

Returns:: Speaker labels

class nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin#

Bases: ABC

An abstract class for diarize-able models.

Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths.

The following abstract classes must be implemented by the subclass:

_setup_diarize_dataloader():
Setup the dataloader for diarization. Receives the output from _diarize_input_manifest_processing().

_diarize_forward():
Implements the model’s custom forward pass to return outputs that are processed by _diarize_output_processing().

_diarize_output_processing():
Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects.

abstract _diarize_forward(batch: Any)#

Internal function to perform the model’s custom forward pass to return outputs that are processed by _diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass.

Parameters:: batch – A batch of input data from the data loader that is used to perform the model’s forward pass.
Returns:: The model’s outputs that are processed by _diarize_output_processing().

_diarize_input_manifest_processing( audio_files: List[str], temp_dir: str, diarcfg: DiarizeConfig, ) → Dict[str, Any]#

Internal function to process the input audio filepaths and return a config dict for the dataloader.

Parameters:

audio_files – A list of string filepaths for audio files.
temp_dir – A temporary directory to store intermediate files.
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

A config dict that is used to setup the dataloader for diarization.

_diarize_input_processing( audio, diarcfg: DiarizeConfig, )#

Internal function to process the input audio data and return a DataLoader. This function is called by diarize() and diarize_generator() to setup the input data for diarization.

Parameters:

audio – Of type GenericDiarizationType
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

A DataLoader object that is used to iterate over the input audio data.

_diarize_on_begin( audio: str | List[str], diarcfg: DiarizeConfig, )#

Internal function to setup the model for diarization. Perform all setup and pre-checks here.

Parameters:

audio (Union[str, List[str]]) – Of type GenericDiarizationType
diarcfg (DiarizeConfig) – An instance of DiarizeConfig.

_diarize_on_end( diarcfg: DiarizeConfig, )#

Internal function to teardown the model after transcription. Perform all teardown and post-checks here.

Parameters:: diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

abstract _diarize_output_processing( outputs, uniq_ids, diarcfg: DiarizeConfig, ) → List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#

Internal function to process the model’s outputs to return the results to the user. This function is called by diarize() and diarize_generator() to process the model’s outputs.

Parameters:

outputs – The model’s outputs that are processed by _diarize_forward().
uniq_ids – List of unique recording identificators in batch
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType.

_input_audio_to_rttm_processing( audio_files: List[str], ) → List[Dict[str, str | float]]#

Generate manifest style dict if audio is a list of paths to audio files.

Parameters:: audio_files – A list of paths to audio files.
Returns:: audio_rttm_map_dict A list of manifest style dicts.

abstract _setup_diarize_dataloader( config: Dict, ) → torch.utils.data.DataLoader#

Internal function to setup the dataloader for diarization. This function is called by diarize() and diarize_generator() to setup the input data for diarization.

Parameters:: config – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing().
Returns:: A DataLoader object that is used to iterate over the input audio data.

diarize(

audio: str | List[str] | numpy.ndarray | torch.utils.data.DataLoader,

batch_size: int = 1,

include_tensor_outputs: bool = False,

postprocessing_yaml: str | None = None,

num_workers: int = 1,

verbose: bool = False,

override_config: DiarizeConfig | None = None,

**config_kwargs,

) → List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#: Takes paths to audio files and returns speaker labels

diarize_generator( audio, override_config: DiarizeConfig | None, )#: A generator version of diarize function.