Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

NeMo Speaker Recognition API#

Model Classes#

class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)#

Bases: ModelPT, ExportableEncDecModel, VerificationMixin

Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for

  • preprocessor

  • Jasper/Quartznet Encoder

  • Speaker Decoder

get_embedding(path2audio_file)#

Returns the speaker embeddings for a provided audio file.

Parameters:

path2audio_file – path to an audio wav file

Returns:

speaker embeddings (Audio representations)

Return type:

emb

verify_speakers(
path2audio_file1,
path2audio_file2,
threshold=0.7,
)#

Verify if two audio files are from the same speaker or not.

Parameters:
  • path2audio_file1 – path to audio wav file of speaker 1

  • path2audio_file2 – path to audio wav file of speaker 2

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

Returns:

True if both audio files are from same speaker, False otherwise

verify_speakers_batch(
audio_files_pairs,
threshold=0.7,
batch_size=32,
sample_rate=16000,
device='cuda',
)#

Verify if audio files from the first and second manifests are from the same speaker or not.

Parameters:
  • audio_files_pairs – list of tuples with audio_files pairs to be verified

  • threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)

  • batch_size – batch size to perform batch inference

  • sample_rate – sample rate of audio files in manifest file

  • device – compute device to perform operations.

Returns:

True if both audio pair is from same speaker, False otherwise