NeMo Speaker Diarization API#

Model Classes#

Mixins#

class nemo.collections.asr.parts.mixins.DiarizationMixin#

Bases: VerificationMixin

abstract diarize(
paths2audio_files: List[str],
batch_size: int = 1,
) List[str]#

Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed

Returns:

Speaker labels

class nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin#

Bases: ABC

An abstract class for diarize-able models.

Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths.

The following abstract classes must be implemented by the subclass:

  • _setup_diarize_dataloader():

    Setup the dataloader for diarization. Receives the output from _diarize_input_manifest_processing().

  • _diarize_forward():

    Implements the model’s custom forward pass to return outputs that are processed by _diarize_output_processing().

  • _diarize_output_processing():

    Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects.

abstract _diarize_forward(batch: Any)#

Internal function to perform the model’s custom forward pass to return outputs that are processed by _diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass.

Parameters:

batch – A batch of input data from the data loader that is used to perform the model’s forward pass.

Returns:

The model’s outputs that are processed by _diarize_output_processing().

_diarize_input_manifest_processing(
audio_files: List[str],
temp_dir: str,
diarcfg: DiarizeConfig,
) Dict[str, Any]#

Internal function to process the input audio filepaths and return a config dict for the dataloader.

Parameters:
  • audio_files – A list of string filepaths for audio files.

  • temp_dir – A temporary directory to store intermediate files.

  • diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

A config dict that is used to setup the dataloader for diarization.

_diarize_input_processing(
audio,
diarcfg: DiarizeConfig,
)#

Internal function to process the input audio data and return a DataLoader. This function is called by diarize() and diarize_generator() to setup the input data for diarization.

Parameters:
  • audio – Of type GenericDiarizationType

  • diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

A DataLoader object that is used to iterate over the input audio data.

_diarize_on_begin(
audio: str | List[str],
diarcfg: DiarizeConfig,
)#

Internal function to setup the model for diarization. Perform all setup and pre-checks here.

Parameters:
  • audio (Union[str, List[str]]) – Of type GenericDiarizationType

  • diarcfg (DiarizeConfig) – An instance of DiarizeConfig.

_diarize_on_end(
diarcfg: DiarizeConfig,
)#

Internal function to teardown the model after transcription. Perform all teardown and post-checks here.

Parameters:

diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

abstract _diarize_output_processing(
outputs,
uniq_ids,
diarcfg: DiarizeConfig,
) List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#

Internal function to process the model’s outputs to return the results to the user. This function is called by diarize() and diarize_generator() to process the model’s outputs.

Parameters:
  • outputs – The model’s outputs that are processed by _diarize_forward().

  • uniq_ids – List of unique recording identificators in batch

  • diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType.

_input_audio_to_rttm_processing(
audio_files: List[str],
) List[Dict[str, str | float]]#

Generate manifest style dict if audio is a list of paths to audio files.

Parameters:

audio_files – A list of paths to audio files.

Returns:

audio_rttm_map_dict A list of manifest style dicts.

abstract _setup_diarize_dataloader(
config: Dict,
) torch.utils.data.DataLoader#

Internal function to setup the dataloader for diarization. This function is called by diarize() and diarize_generator() to setup the input data for diarization.

Parameters:

config – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing().

Returns:

A DataLoader object that is used to iterate over the input audio data.

diarize(
audio: str | List[str] | numpy.ndarray | torch.utils.data.DataLoader,
batch_size: int = 1,
include_tensor_outputs: bool = False,
postprocessing_yaml: str | None = None,
num_workers: int = 1,
verbose: bool = False,
override_config: DiarizeConfig | None = None,
**config_kwargs,
) List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#

Takes paths to audio files and returns speaker labels

diarize_generator(
audio,
override_config: DiarizeConfig | None,
)#

A generator version of diarize function.