About Audio Curation#

NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.

Use Cases#

  • Process and curate large-scale speech datasets for ASR model training

  • Perform quality assessment and filtering based on transcription accuracy metrics

  • Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models

  • Integrate audio processing with text curation pipelines for multi-modal workflows

  • Scale audio processing across GPU clusters efficiently


Introduction#

Master the fundamentals of NeMo Curator and set up your audio processing environment.

Concepts

Learn about AudioBatch, ASR pipelines, and other core data structures for efficient audio curation

Audio Curation Concepts
Get Started

Learn prerequisites, setup instructions, and initial configuration for audio curation

Get Started with Audio Curation

Curation Tasks#

Load Data#

Import your audio data from various sources into NeMo Curator’s processing pipeline.

Local Files

Load audio files from local directories and file systems

Load Local Audio Files
Custom Manifests

Create and load custom audio dataset manifests with metadata

Create and Load Custom Audio Manifests
FLEURS Dataset

Load and process the multilingual FLEURS speech dataset

Load FLEURS Dataset

Process Data#

Transform and enhance your audio data through ASR inference, quality assessment, and analysis.

ASR Inference

Generate transcriptions using NVIDIA NeMo ASR models

ASR Inference
Quality Assessment

Assess transcription quality using WER and CER

Quality Assessment for Audio Data
Audio Analysis

Analyze audio characteristics including duration and format validation

Audio Analysis
Text Integration

Integrate audio processing results with text curation workflows

Text Integration for Audio Data

Save & Export#

Save processed audio data and transcriptions in formats suitable for downstream training and analysis.

Save & Export

Export curated audio datasets with transcriptions and quality metrics

Save & Export Audio Data

Tutorials#

Build practical experience with step-by-step guides for common audio curation workflows.

Beginner Tutorial

Learn the basics of audio loading, ASR inference, and quality filtering

Beginner Audio Processing Tutorial