Models#

Currently speaker diarization pipeline in NeMo uses MarbleNet model for Voice Activity Detection (VAD) and SpeakerNet, ECAPA_TDNN and TitaNet models for speaker embedding extraction.