NeMo Speaker Recogniton API
Contents
NeMo Speaker Recogniton API#
Model Classes#
- class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)[source]#
Bases:
nemo.core.classes.modelPT.ModelPT
,nemo.collections.asr.models.asr_model.ExportableEncDecModel
Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for
preprocessor
Jasper/Quartznet Encoder
Speaker Decoder
- static get_batch_embeddings(speaker_model, manifest_filepath, batch_size=32, sample_rate=16000, device='cuda')#
- get_embedding(path2audio_file)#
Returns the speaker embeddings for a provided audio file.
- Parameters
path2audio_file – path to audio wav file
- Returns
speaker embeddings
- Return type
embs
- property input_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]#
Define these to enable input neural type checks
- classmethod list_available_models() List[nemo.core.classes.common.PretrainedModelInfo] [source]#
This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud. :returns: List of available pre-trained models.
- multi_test_epoch_end(outputs, dataloader_idx: int = 0)[source]#
Adds support for multiple test datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders.
- Parameters
outputs – Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.
dataloader_idx – int representing the index of the dataloader.
- Returns
A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix.
- multi_validation_epoch_end(outputs, dataloader_idx: int = 0)[source]#
Adds support for multiple validation datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders.
- Parameters
outputs – Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.
dataloader_idx – int representing the index of the dataloader.
- Returns
A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix.
- property output_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]#
Define these to enable output neural type checks
- setup_finetune_model(model_config: omegaconf.DictConfig)[source]#
setup_finetune_model method sets up training data, validation data and test data with new provided config, this checks for the previous labels set up during training from scratch, if None, it sets up labels for provided finetune data from manifest files
- Parameters
model_config – cfg which has train_ds, optional validation_ds, optional test_ds,
data. (mandatory encoder and decoder model params. Make sure you set num_classes correctly for finetune) –
- Returns
None
- setup_test_data(test_data_layer_params: Optional[Union[omegaconf.DictConfig, Dict]])[source]#
(Optionally) Setups data loader to be used in test
- Parameters
test_data_layer_config – test data layer parameters.
Returns:
- setup_training_data(train_data_layer_config: Optional[Union[omegaconf.DictConfig, Dict]])[source]#
Setups data loader to be used in training
- Parameters
train_data_layer_config – training data layer parameters.
Returns:
- setup_validation_data(val_data_layer_config: Optional[Union[omegaconf.DictConfig, Dict]])[source]#
Setups data loader to be used in validation :param val_data_layer_config: validation data layer parameters.
Returns:
- verify_speakers(path2audio_file1, path2audio_file2, threshold=0.7)#
Verify if two audio files are from the same speaker or not.
- Parameters
path2audio_file1 – path to audio wav file of speaker 1
path2audio_file2 – path to audio wav file of speaker 2
threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)
- Returns
True if both audio files are from same speaker, False otherwise