NeMo Speaker Recogniton API

Model Classes

class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)[source]

Bases: nemo.core.classes.modelPT.ModelPT, nemo.collections.asr.models.asr_model.ExportableEncDecModel

Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for

  • preprocessor

  • Jasper/Quartznet Encoder

  • Speaker Decoder

static extract_labels(data_layer_config)[source]
forward(input_signal, input_signal_length)[source]
forward_for_export(processed_signal, processed_signal_len)[source]
static get_batch_embeddings(speaker_model, manifest_filepath, batch_size=32, sample_rate=16000, device='cuda')
get_embedding(path2audio_file)

Returns the speaker embeddings for a provided audio file.

Parameters

path2audio_file – path to audio wav file

Returns

speaker embeddings

Return type

embs

property input_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]

Define these to enable input neural type checks

classmethod list_available_models() List[nemo.core.classes.common.PretrainedModelInfo][source]

This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

multi_test_epoch_end(outputs, dataloader_idx: int = 0)[source]

Adds support for multiple test datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders.

Parameters
  • outputs – Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.

  • dataloader_idx – int representing the index of the dataloader.

Returns

A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix.

multi_validation_epoch_end(outputs, dataloader_idx: int = 0)[source]

Adds support for multiple validation datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders.

Parameters
  • outputs – Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.

  • dataloader_idx – int representing the index of the dataloader.

Returns

A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix.

property output_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]

Define these to enable output neural type checks

setup_finetune_model(model_config: omegaconf.DictConfig)[source]

setup_finetune_model method sets up training data, validation data and test data with new provided config, this checks for the previous labels set up during training from scratch, if None, it sets up labels for provided finetune data from manifest files

Parameters
  • model_config – cfg which has train_ds, optional validation_ds, optional test_ds,

  • data. (mandatory encoder and decoder model params. Make sure you set num_classes correctly for finetune) –

Returns

None

setup_test_data(test_data_layer_params: Optional[Union[omegaconf.DictConfig, Dict]])[source]

(Optionally) Setups data loader to be used in test

Parameters

test_data_layer_config – test data layer parameters.

Returns:

setup_training_data(train_data_layer_config: Optional[Union[omegaconf.DictConfig, Dict]])[source]

Setups data loader to be used in training

Parameters

train_data_layer_config – training data layer parameters.

Returns:

setup_validation_data(val_data_layer_config: Optional[Union[omegaconf.DictConfig, Dict]])[source]

Setups data loader to be used in validation :param val_data_layer_config: validation data layer parameters.

Returns:

test_dataloader()[source]
test_step(batch, batch_idx, dataloader_idx: int = 0)[source]
training_step(batch, batch_idx)[source]
validation_step(batch, batch_idx, dataloader_idx: int = 0)[source]
verify_speakers(path2audio_file1, path2audio_file2, threshold=0.7)

Verify if two audio files are from the same speaker or not.

Parameters
  • path2audio_file1 – path to audio wav file of speaker 1

  • path2audio_file2 – path to audio wav file of speaker 2

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

Returns

True if both audio files are from same speaker, False otherwise