NeMo Speaker Diarization API#
Model Classes#
Mixins#
- class nemo.collections.asr.parts.mixins.DiarizationMixin#
Bases:
VerificationMixin
- abstract diarize(
- paths2audio_files: List[str],
- batch_size: int = 1,
Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed
- Returns:
Speaker labels
- class nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin#
Bases:
ABC
An abstract class for diarize-able models.
Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths.
The following abstract classes must be implemented by the subclass:
- _setup_diarize_dataloader():
Setup the dataloader for diarization. Receives the output from _diarize_input_manifest_processing().
- _diarize_forward():
Implements the model’s custom forward pass to return outputs that are processed by _diarize_output_processing().
- _diarize_output_processing():
Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects.
- abstract _diarize_forward(batch: Any)#
Internal function to perform the model’s custom forward pass to return outputs that are processed by _diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass.
- Parameters:
batch – A batch of input data from the data loader that is used to perform the model’s forward pass.
- Returns:
The model’s outputs that are processed by _diarize_output_processing().
- _diarize_input_manifest_processing(
- audio_files: List[str],
- temp_dir: str,
- diarcfg: DiarizeConfig,
Internal function to process the input audio filepaths and return a config dict for the dataloader.
- Parameters:
audio_files – A list of string filepaths for audio files.
temp_dir – A temporary directory to store intermediate files.
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- Returns:
A config dict that is used to setup the dataloader for diarization.
- _diarize_input_processing(
- audio,
- diarcfg: DiarizeConfig,
Internal function to process the input audio data and return a DataLoader. This function is called by diarize() and diarize_generator() to setup the input data for diarization.
- Parameters:
audio – Of type GenericDiarizationType
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- Returns:
A DataLoader object that is used to iterate over the input audio data.
- _diarize_on_begin(
- audio: str | List[str],
- diarcfg: DiarizeConfig,
Internal function to setup the model for diarization. Perform all setup and pre-checks here.
- Parameters:
audio (Union[str, List[str]]) – Of type GenericDiarizationType
diarcfg (DiarizeConfig) – An instance of DiarizeConfig.
- _diarize_on_end(
- diarcfg: DiarizeConfig,
Internal function to teardown the model after transcription. Perform all teardown and post-checks here.
- Parameters:
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- abstract _diarize_output_processing(
- outputs,
- uniq_ids,
- diarcfg: DiarizeConfig,
Internal function to process the model’s outputs to return the results to the user. This function is called by diarize() and diarize_generator() to process the model’s outputs.
- Parameters:
outputs – The model’s outputs that are processed by _diarize_forward().
uniq_ids – List of unique recording identificators in batch
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- Returns:
The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType.
- _input_audio_to_rttm_processing(
- audio_files: List[str],
Generate manifest style dict if audio is a list of paths to audio files.
- Parameters:
audio_files – A list of paths to audio files.
- Returns:
audio_rttm_map_dict A list of manifest style dicts.
- abstract _setup_diarize_dataloader(
- config: Dict,
Internal function to setup the dataloader for diarization. This function is called by diarize() and diarize_generator() to setup the input data for diarization.
- Parameters:
config – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing().
- Returns:
A DataLoader object that is used to iterate over the input audio data.
- diarize(
- audio: str | List[str] | numpy.ndarray | torch.utils.data.DataLoader,
- batch_size: int = 1,
- include_tensor_outputs: bool = False,
- postprocessing_yaml: str | None = None,
- num_workers: int = 1,
- verbose: bool = False,
- override_config: DiarizeConfig | None = None,
- **config_kwargs,
Takes paths to audio files and returns speaker labels
- diarize_generator(
- audio,
- override_config: DiarizeConfig | None,
A generator version of diarize function.