NeMo Speaker Recognition API#
Model Classes#
- class nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(*args: Any, **kwargs: Any)#
Bases:
ModelPT
,ExportableEncDecModel
,VerificationMixin
Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for
preprocessor
Jasper/Quartznet Encoder
Speaker Decoder
- get_embedding(path2audio_file)#
Returns the speaker embeddings for a provided audio file.
- Parameters:
path2audio_file – path to an audio wav file
- Returns:
speaker embeddings (Audio representations)
- Return type:
emb
- verify_speakers(
- path2audio_file1,
- path2audio_file2,
- threshold=0.7,
Verify if two audio files are from the same speaker or not.
- Parameters:
path2audio_file1 – path to audio wav file of speaker 1
path2audio_file2 – path to audio wav file of speaker 2
threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)
- Returns:
True if both audio files are from same speaker, False otherwise
- verify_speakers_batch(
- audio_files_pairs,
- threshold=0.7,
- batch_size=32,
- sample_rate=16000,
- device='cuda',
Verify if audio files from the first and second manifests are from the same speaker or not.
- Parameters:
audio_files_pairs – list of tuples with audio_files pairs to be verified
threshold – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7)
batch_size – batch size to perform batch inference
sample_rate – sample rate of audio files in manifest file
device – compute device to perform operations.
- Returns:
True if both audio pair is from same speaker, False otherwise