Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Checkpoints#
There are two main ways to load pretrained checkpoints in NeMo as introduced in loading ASR checkpoints. In speaker diarization, the diarizer loads checkpoints that are passed through the config file. For example:
Loading Local Checkpoints#
Load VAD models
pretrained_vad_model='/path/to/vad_multilingual_marblenet.nemo' # local .nemo or pretrained vad model name
...
# pass with hydra config
config.diarizer.vad.model_path=pretrained_vad_model
Load speaker embedding models
pretrained_speaker_model='/path/to/titanet-l.nemo' # local .nemo or pretrained speaker embedding model name
...
# pass with hydra config
config.diarizer.speaker_embeddings.model_path=pretrained_speaker_model
Load neural diarizer models
pretrained_neural_diarizer_model='/path/to/diarizer_msdd_telephonic.nemo' # local .nemo or pretrained neural diarizer model name
...
# pass with hydra config
config.diarizer.msdd_model.model_path=pretrained_neural_diarizer_model
NeMo will automatically save checkpoints of a model you are training in a .nemo format.
You can also manually save your models at any point using model.save_to(<checkpoint_path>.nemo)
.
Inference#
Note
For details and deep understanding, please refer to <NeMo_git_root>/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb
.
Check out Datasets for preparing audio files and optional label files.
Run and evaluate speaker diarizer with below command:
# Have a look at the instruction inside the script and pass the arguments you might need.
python <NeMo_git_root>/examples/speaker_tasks/diarization/offline_diarization.py
NGC Pretrained Checkpoints#
The ASR collection has checkpoints of several models trained on various datasets for a variety of tasks. These checkpoints are obtainable via NGC NeMo Automatic Speech Recognition collection. The model cards on NGC contain more information about each of the checkpoints available.
In general, you can load models with model name in the following format,
pretrained_vad_model='vad_multilingual_marblenet'
pretrained_speaker_model='titanet_large'
pretrained_neural_diarizer_model='diar_msdd_telephonic'
...
config.diarizer.vad.model_path=retrained_vad_model \
config.diarizer.speaker_embeddings.model_path=pretrained_speaker_model \
config.diarizer.msdd_model.model_path=pretrained_neural_diarizer_model
where the model name is the value under “Model Name” entry in the tables below.
Models for Speaker Diarization Pipeline#
Model Name |
Model Base Class |
Model Card |
---|---|---|
vad_multilingual_marblenet |
EncDecClassificationModel |
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/vad_multilingual_marblenet |
vad_marblenet |
EncDecClassificationModel |
https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_marblenet |
vad_telephony_marblenet |
EncDecClassificationModel |
https://ngc.nvidia.com/catalog/models/nvidia:nemo:vad_telephony_marblenet |
titanet_large |
EncDecSpeakerLabelModel |
https://ngc.nvidia.com/catalog/models/nvidia:nemo:titanet_large |
ecapa_tdnn |
EncDecSpeakerLabelModel |
https://ngc.nvidia.com/catalog/models/nvidia:nemo:ecapa_tdnn |
diar_msdd_telephonic |
EncDecDiarLabelModel |
https://ngc.nvidia.com/catalog/models/nvidia:diar_msdd_telephonic |