Use NeMo Framework’s automatic speech recognition models for transcription in your audio curation pipelines. This guide covers basic usage and configuration.
NeMo Framework provides pre-trained ASR models through the Hugging Face model hub. For the complete list of available models and their specifications, refer to the NeMo Framework ASR documentation.
Models are automatically downloaded and cached when first loaded:
Configure GPU and CPU resources based on your hardware:
Resource requirements vary by model. Test with your specific model to determine optimal settings.