Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
NeMo Speaker Diarization API#
Model Classes#
Mixins#
- class nemo.collections.asr.parts.mixins.DiarizationMixin#
Bases:
VerificationMixin
- abstract diarize(
- paths2audio_files: List[str],
- batch_size: int = 1,
Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed
- Returns:
Speaker labels
- class nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin#
Bases:
ABC
An abstract class for diarize-able models.
Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths.
The following abstract classes must be implemented by the subclass:
- _setup_diarize_dataloader():
Setup the dataloader for diarization. Receives the output from _diarize_input_manifest_processing().
- _diarize_forward():
Implements the model’s custom forward pass to return outputs that are processed by _diarize_output_processing().
- _diarize_output_processing():
Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects.
- abstract _diarize_forward(batch: Any)#
Internal function to perform the model’s custom forward pass to return outputs that are processed by _diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass.
- Parameters:
batch – A batch of input data from the data loader that is used to perform the model’s forward pass.
- Returns:
The model’s outputs that are processed by _diarize_output_processing().
- _diarize_input_manifest_processing(
- audio_files: List[str],
- temp_dir: str,
- diarcfg: DiarizeConfig,
Internal function to process the input audio filepaths and return a config dict for the dataloader.
- Parameters:
audio_files – A list of string filepaths for audio files.
temp_dir – A temporary directory to store intermediate files.
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- Returns:
A config dict that is used to setup the dataloader for diarization.
- _diarize_input_processing(
- audio,
- diarcfg: DiarizeConfig,
Internal function to process the input audio data and return a DataLoader. This function is called by diarize() and diarize_generator() to setup the input data for diarization.
- Parameters:
audio – Of type GenericDiarizationType
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- Returns:
A DataLoader object that is used to iterate over the input audio data.
- _diarize_on_begin(
- audio: str | List[str],
- diarcfg: DiarizeConfig,
Internal function to setup the model for diarization. Perform all setup and pre-checks here.
- Parameters:
audio (Union[str, List[str]]) – Of type GenericDiarizationType
diarcfg (DiarizeConfig) – An instance of DiarizeConfig.
- _diarize_on_end(
- diarcfg: DiarizeConfig,
Internal function to teardown the model after transcription. Perform all teardown and post-checks here.
- Parameters:
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- abstract _diarize_output_processing(
- outputs,
- uniq_ids,
- diarcfg: DiarizeConfig,
Internal function to process the model’s outputs to return the results to the user. This function is called by diarize() and diarize_generator() to process the model’s outputs.
- Parameters:
outputs – The model’s outputs that are processed by _diarize_forward().
uniq_ids – List of unique recording identificators in batch
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
- Returns:
The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType.
- _input_audio_to_rttm_processing(
- audio_files: List[str],
Generate manifest style dict if audio is a list of paths to audio files.
- Parameters:
audio_files – A list of paths to audio files.
- Returns:
audio_rttm_map_dict A list of manifest style dicts.
- abstract _setup_diarize_dataloader(
- config: Dict,
Internal function to setup the dataloader for diarization. This function is called by diarize() and diarize_generator() to setup the input data for diarization.
- Parameters:
config – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing().
- Returns:
A DataLoader object that is used to iterate over the input audio data.
- diarize(
- audio: str | List[str] | numpy.ndarray | torch.utils.data.DataLoader,
- batch_size: int = 1,
- include_tensor_outputs: bool = False,
- postprocessing_yaml: str | None = None,
- num_workers: int = 1,
- verbose: bool = False,
- override_config: DiarizeConfig | None = None,
- **config_kwargs,
Takes paths to audio files and returns speaker labels
- diarize_generator(
- audio,
- override_config: DiarizeConfig | None,
A generator version of diarize function.