Process audio data you’ve loaded into AudioBatch objects using NeMo Curator’s comprehensive audio processing capabilities.
NeMo Curator provides a specialized suite of tools for processing speech and audio data as part of the AI training pipeline. These tools help you transcribe, analyze, filter, and integrate audio datasets to ensure high-quality input for ASR model training and multimodal applications.
NeMo Curator’s audio processing capabilities are organized into four main categories:
Each category provides GPU-accelerated implementations optimized for different speech curation needs. The result is a cleaned and filtered audio dataset with high-quality transcriptions ready for model training.
Transcribe audio files using NeMo Framework’s state-of-the-art ASR models with GPU acceleration.
Use pretrained NeMo ASR models for accurate speech recognition
Efficiently process large audio datasets with configurable batch sizes
Evaluate and filter audio quality using transcription accuracy and audio characteristics.
Filter audio samples based on Word Error Rate thresholds
Filter audio samples by duration ranges and speech rate metrics
Extract and analyze audio file characteristics for quality control and metadata generation.
Calculate precise audio duration using soundfile library
Validate audio file formats and detect corrupted files
Convert processed audio data to text processing workflows for multimodal applications.