Curate Audio

About Audio Curation

View as Markdown

NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.

Use Cases

  • Process and curate large-scale speech datasets for ASR model training
  • Perform quality assessment and filtering based on transcription accuracy metrics
  • Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models
  • Integrate audio processing with text curation pipelines for multi-modal workflows
  • Scale audio processing across GPU clusters efficiently

Introduction

Master the fundamentals of NeMo Curator and set up your audio processing environment.

Curation Tasks

Load Data

Import your audio data from various sources into NeMo Curator’s processing pipeline.

Process Data

Transform and enhance your audio data through ASR inference, quality assessment, and analysis.

Save & Export

Save processed audio data and transcriptions in formats suitable for downstream training and analysis.


Tutorials

Build practical experience with step-by-step guides for common audio curation workflows.