Models

Currenlty NeMo’s Speaker Diarization pipeline uses MarbleNet model for Voice Activity Detection (VAD) and SpeakerNet & ECAPA_TDNN models for Speaker Embedding Extraction.