NeMo Speaker Recogniton API#

Model Classes#

class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)[source]#

Bases: nemo.core.classes.modelPT.ModelPT, nemo.collections.asr.models.asr_model.ExportableEncDecModel

Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for

  • preprocessor

  • Jasper/Quartznet Encoder

  • Speaker Decoder

static extract_labels(data_layer_config)[source]#
forward(input_signal, input_signal_length)[source]#
forward_for_export(processed_signal, processed_signal_len)[source]#

This forward is used when we need to export the model to ONNX format. Inputs cache_last_channel and cache_last_time are needed to be passed for exporting streaming models. When they are passed, it just passes the inputs through the encoder part and currently the ONNX conversion does not fully work for this case. :param input: Tensor that represents a batch of raw audio signals,

of shape [B, T]. T here represents timesteps.

Parameters
  • length – Vector of length B, that contains the individual lengths of the audio sequences.

  • cache_last_channel – Tensor of shape [N, B, T, H] which contains the cache for last channel layers

  • cache_last_time – Tensor of shape [N, B, H, T] which contains the cache for last time layers N is the number of such layers which need caching, B is batch size, H is the hidden size of activations, and T is the length of the cache

Returns

the output of the model

static get_batch_embeddings(speaker_model, manifest_filepath, batch_size=32, sample_rate=16000, device='cuda')#
get_embedding(path2audio_file)#

Returns the speaker embeddings for a provided audio file.

Parameters

path2audio_file – path to audio wav file

Returns

speaker embeddings

Return type

embs

property input_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]#

Define these to enable input neural type checks

classmethod list_available_models() List[nemo.core.classes.common.PretrainedModelInfo][source]#

This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.

multi_test_epoch_end(outputs, dataloader_idx: int = 0)[source]#

Adds support for multiple test datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders.

Parameters
  • outputs – Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.

  • dataloader_idx – int representing the index of the dataloader.

Returns

A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix.

multi_validation_epoch_end(outputs, dataloader_idx: int = 0)[source]#

Adds support for multiple validation datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders.

Parameters
  • outputs – Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.

  • dataloader_idx – int representing the index of the dataloader.

Returns

A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix.

property output_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]#

Define these to enable output neural type checks

setup_test_data(test_data_layer_params: Optional[Union[omegaconf.DictConfig, Dict]])[source]#

(Optionally) Setups data loader to be used in test

Parameters

test_data_layer_config – test data layer parameters.

Returns:

setup_training_data(train_data_layer_config: Optional[Union[omegaconf.DictConfig, Dict]])[source]#

Setups data loader to be used in training

Parameters

train_data_layer_config – training data layer parameters.

Returns:

setup_validation_data(val_data_layer_config: Optional[Union[omegaconf.DictConfig, Dict]])[source]#

Setups data loader to be used in validation :param val_data_layer_config: validation data layer parameters.

Returns:

test_dataloader()[source]#
test_step(batch, batch_idx, dataloader_idx: int = 0)[source]#
training_step(batch, batch_idx)[source]#
validation_step(batch, batch_idx, dataloader_idx: int = 0)[source]#
verify_speakers(path2audio_file1, path2audio_file2, threshold=0.7)#

Verify if two audio files are from the same speaker or not.

Parameters
  • path2audio_file1 – path to audio wav file of speaker 1

  • path2audio_file2 – path to audio wav file of speaker 2

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

Returns

True if both audio files are from same speaker, False otherwise