NeMo Speaker Recognition API#

Model Classes#

class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)#

Bases: ModelPT, ExportableEncDecModel, VerificationMixin

Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for

  • preprocessor

  • Jasper/Quartznet Encoder

  • Speaker Decoder

get_embedding(path2audio_file)#

Returns the speaker embeddings for a provided audio file.

Parameters:

path2audio_file – path to an audio wav file

Returns:

speaker embeddings (Audio representations)

Return type:

emb

verify_speakers(
path2audio_file1,
path2audio_file2,
threshold=0.7,
)#

Verify if two audio files are from the same speaker or not.

Parameters:
  • path2audio_file1 – path to audio wav file of speaker 1

  • path2audio_file2 – path to audio wav file of speaker 2

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

Returns:

True if both audio files are from same speaker, False otherwise

verify_speakers_batch(
audio_files_pairs,
threshold=0.7,
batch_size=32,
sample_rate=16000,
device='cuda',
)#

Verify if audio files from the first and second manifests are from the same speaker or not.

Parameters:
  • audio_files_pairs – list of tuples with audio_files pairs to be verified

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

  • batch_size – batch size to perform batch inference

  • sample_rate – sample rate of audio files in manifest file

  • device – compute device to perform operations.

Returns:

True if both audio pair is from same speaker, False otherwise