For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Use Cases
  • Introduction
  • Curation Tasks
  • Load Data
  • Process Data
  • Save & Export
  • Tutorials
Curate Audio

About Audio Curation

||View as Markdown|
Previous

Save and Export

Next

Overview

NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.

Use Cases

  • Process and curate large-scale speech datasets for ASR model training
  • Perform quality assessment and filtering based on transcription accuracy metrics
  • Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models
  • Integrate audio processing with text curation pipelines for multi-modal workflows
  • Scale audio processing across GPU clusters efficiently

Introduction

Master the fundamentals of NeMo Curator and set up your audio processing environment.

Concepts

Learn about AudioTask, ASR pipelines, and other core data structures for efficient audio curation data-structures asr-pipeline quality-metrics

Get Started

Learn prerequisites, setup instructions, and initial configuration for audio curation setup configuration quickstart

Curation Tasks

Load Data

Import your audio data from various sources into NeMo Curator’s processing pipeline.

Local Files

Load audio files from local directories and file systems local-storage file-discovery batch-processing

Custom Manifests

Create and load custom audio dataset manifests with metadata manifests metadata custom-formats

FLEURS Dataset

Load and process the multilingual FLEURS speech dataset fleurs multilingual benchmarks

Process Data

Transform and enhance your audio data through ASR inference, quality assessment, and analysis.

ASR Inference

Generate transcriptions using NVIDIA NeMo ASR models nemo-models transcription gpu-accelerated

Quality Assessment

Assess transcription quality using WER and CER wer-filtering duration-filtering

Audio Analysis

Analyze audio characteristics including duration and format validation duration-calculation format-validation metadata-extraction

Text Integration

Integrate audio processing results with text curation workflows multimodal text-filtering pipeline-integration

Save & Export

Save processed audio data and transcriptions in formats suitable for downstream training and analysis.

Save & Export

Export curated audio datasets with transcriptions and quality metrics manifests parquet metadata


Tutorials

Build practical experience with step-by-step guides for common audio curation workflows.

Beginner Tutorial

Learn the basics of audio loading, ASR inference, and quality filtering asr-inference quality-filtering basic-workflow