Checkpoints

There are two main ways to load pretrained checkpoints in NeMo as introduced in loading ASR checkpoints. In speaker diarization, the diarizer loads checkpoints that are passed through the config file. For example:

Load VAD models

Copy
Copied!
            

pretrained_vad_model='/path/to/vad_multilingual_marblenet.nemo' # local .nemo or pretrained vad model name ... # pass with hydra config config.diarizer.vad.model_path=pretrained_vad_model

Load speaker embedding models

Copy
Copied!
            

pretrained_speaker_model='/path/to/titanet-l.nemo' # local .nemo or pretrained speaker embedding model name ... # pass with hydra config config.diarizer.speaker_embeddings.model_path=pretrained_speaker_model

Load neural diarizer models

Copy
Copied!
            

pretrained_neural_diarizer_model='/path/to/diarizer_msdd_telephonic.nemo' # local .nemo or pretrained neural diarizer model name ... # pass with hydra config config.diarizer.msdd_model.model_path=pretrained_neural_diarizer_model

NeMo will automatically save checkpoints of a model you are training in a .nemo format. You can also manually save your models at any point using model.save_to(<checkpoint_path>.nemo).

Note

For details and deep understanding, please refer to <NeMo_git_root>/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb.

Check out Datasets for preparing audio files and optional label files.

Run and evaluate speaker diarizer with below command:

Copy
Copied!
            

# Have a look at the instruction inside the script and pass the arguments you might need. python <NeMo_git_root>/examples/speaker_tasks/diarization/offline_diarization.py

The ASR collection has checkpoints of several models trained on various datasets for a variety of tasks. These checkpoints are obtainable via NGC NeMo Automatic Speech Recognition collection. The model cards on NGC contain more information about each of the checkpoints available.

In general, you can load models with model name in the following format,

Copy
Copied!
            

pretrained_vad_model='vad_multilingual_marblenet' pretrained_speaker_model='titanet_large' pretrained_neural_diarizer_model='diar_msdd_telephonic' ... config.diarizer.vad.model_path=retrained_vad_model \ config.diarizer.speaker_embeddings.model_path=pretrained_speaker_model \ config.diarizer.msdd_model.model_path=pretrained_neural_diarizer_model

where the model name is the value under “Model Name” entry in the tables below.

Models for Speaker Diarization Pipeline

Model Name

Model Base Class

Model Card

vad_multilingual_marblenet EncDecClassificationModel https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/vad_multilingual_marblenet
vad_marblenet EncDecClassificationModel https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_marblenet
vad_telephony_marblenet EncDecClassificationModel https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_telephony_marblenet
titanet_large EncDecSpeakerLabelModel https://ngc.nvidia.com/catalog/models/nvidia:nemo:titanet_large
ecapa_tdnn EncDecSpeakerLabelModel https://ngc.nvidia.com/catalog/models/nvidia:nemo:ecapa_tdnn
diar_msdd_telephonic EncDecDiarLabelModel https://ngc.nvidia.com/catalog/models/nvidia:diar_msdd_telephonic
Previous Datasets
Next NeMo Speaker Diarization Configuration Files
© Copyright 2023-2024, NVIDIA. Last updated on May 17, 2024.