NVIDIA NeMo Framework supports the training and customization of Speech AI models, specifically designed to enable voice-based interfaces for conversational AI applications. For more information on each speech subdomain, refer to the following sections in the NeMo Developer Documentation.
Automatic Speech Recognition
Speech Classification
Speaker Recognition
Speaker Diarization
Text To Speech
NeMo Framework also includes a large set of :doc:Speech AI tools <../NeMoToolKit/tools/intro> for dataset preparation, model evaluation and text normalization.