> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Comprehensive audio curation capabilities for speech data processing including ASR inference, quality assessment, and text integration workflows

# About Audio Curation

NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.

## Use Cases

* Process and curate large-scale speech datasets for ASR model training
* Perform quality assessment and filtering based on transcription accuracy metrics
* Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models
* Integrate audio processing with text curation pipelines for multi-modal workflows
* Scale audio processing across GPU clusters efficiently

***

## Introduction

Master the fundamentals of NeMo Curator and set up your audio processing environment.

Learn about AudioTask, ASR pipelines, and other core data structures for efficient audio curation
data-structures
asr-pipeline
quality-metrics

Learn prerequisites, setup instructions, and initial configuration for audio curation
setup
configuration
quickstart

## Curation Tasks

### Load Data

Import your audio data from various sources into NeMo Curator's processing pipeline.

Load audio files from local directories and file systems
local-storage
file-discovery
batch-processing

Create and load custom audio dataset manifests with metadata
manifests
metadata
custom-formats

Load and process the multilingual FLEURS speech dataset
fleurs
multilingual
benchmarks

### Process Data

Transform and enhance your audio data through ASR inference, quality assessment, and analysis.

Generate transcriptions using NVIDIA NeMo ASR models
nemo-models
transcription
gpu-accelerated

Assess transcription quality using WER and CER
wer-filtering
duration-filtering

Analyze audio characteristics including duration and format validation
duration-calculation
format-validation
metadata-extraction

Integrate audio processing results with text curation workflows
multimodal
text-filtering
pipeline-integration

### Save & Export

Save processed audio data and transcriptions in formats suitable for downstream training and analysis.

Export curated audio datasets with transcriptions and quality metrics
manifests
parquet
metadata

***

## Tutorials

Build practical experience with step-by-step guides for common audio curation workflows.

Learn the basics of audio loading, ASR inference, and quality filtering
asr-inference
quality-filtering
basic-workflow