NeMo Speaker Recognition API

class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)

Bases: nemo.core.classes.modelPT.ModelPT, nemo.collections.asr.models.asr_model.ExportableEncDecModel

Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for

  • preprocessor

  • Jasper/Quartznet Encoder

  • Speaker Decoder

get_embedding(path2audio_file)

Returns the speaker embeddings for a provided audio file.

Parameters

path2audio_file – path to an audio wav file

Returns

speaker embeddings (Audio representations)

Return type

emb

verify_speakers(path2audio_file1, path2audio_file2, threshold=0.7)

Verify if two audio files are from the same speaker or not.

Parameters
  • path2audio_file1 – path to audio wav file of speaker 1

  • path2audio_file2 – path to audio wav file of speaker 2

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

Returns

True if both audio files are from same speaker, False otherwise

Previous Checkpoints
Next Resource and Documentation Guide
© Copyright 2023-2024, NVIDIA. Last updated on Apr 12, 2024.