NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.
Master the fundamentals of NeMo Curator and set up your audio processing environment.
Learn about AudioTask, ASR pipelines, and other core data structures for efficient audio curation data-structures asr-pipeline quality-metrics
Learn prerequisites, setup instructions, and initial configuration for audio curation setup configuration quickstart
Import your audio data from various sources into NeMo Curator’s processing pipeline.
Load audio files from local directories and file systems local-storage file-discovery batch-processing
Create and load custom audio dataset manifests with metadata manifests metadata custom-formats
Load and process the multilingual FLEURS speech dataset fleurs multilingual benchmarks
Transform and enhance your audio data through ASR inference, quality assessment, and analysis.
Generate transcriptions using NVIDIA NeMo ASR models nemo-models transcription gpu-accelerated
Assess transcription quality using WER and CER wer-filtering duration-filtering
Analyze audio characteristics including duration and format validation duration-calculation format-validation metadata-extraction
Integrate audio processing results with text curation workflows multimodal text-filtering pipeline-integration
Save processed audio data and transcriptions in formats suitable for downstream training and analysis.
Build practical experience with step-by-step guides for common audio curation workflows.