***
description: >-
Comprehensive audio curation capabilities for speech data processing including
ASR inference, quality assessment, and text integration workflows
categories:
* workflows
tags:
* audio-curation
* asr-inference
* speech-processing
* quality-metrics
* manifests
* text-integration
personas:
* data-scientist-focused
* mle-focused
difficulty: beginner
content\_type: workflow
modality: audio-only
***
# About Audio Curation
NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.
## Use Cases
* Process and curate large-scale speech datasets for ASR model training
* Perform quality assessment and filtering based on transcription accuracy metrics
* Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models
* Integrate audio processing with text curation pipelines for multi-modal workflows
* Scale audio processing across GPU clusters efficiently
***
## Introduction
Master the fundamentals of NeMo Curator and set up your audio processing environment.
Learn about AudioBatch, ASR pipelines, and other core data structures for efficient audio curation
data-structures
asr-pipeline
quality-metrics
Learn prerequisites, setup instructions, and initial configuration for audio curation
setup
configuration
quickstart
## Curation Tasks
### Load Data
Import your audio data from various sources into NeMo Curator's processing pipeline.
Load audio files from local directories and file systems
local-storage
file-discovery
batch-processing
Create and load custom audio dataset manifests with metadata
manifests
metadata
custom-formats
Load and process the multilingual FLEURS speech dataset
fleurs
multilingual
benchmarks
### Process Data
Transform and enhance your audio data through ASR inference, quality assessment, and analysis.
Generate transcriptions using NVIDIA NeMo ASR models
nemo-models
transcription
gpu-accelerated
Assess transcription quality using WER and CER
wer-filtering
duration-filtering
Analyze audio characteristics including duration and format validation
duration-calculation
format-validation
metadata-extraction
Integrate audio processing results with text curation workflows
multimodal
text-filtering
pipeline-integration
### Save & Export
Save processed audio data and transcriptions in formats suitable for downstream training and analysis.
Export curated audio datasets with transcriptions and quality metrics
manifests
parquet
metadata
***
## Tutorials
Build practical experience with step-by-step guides for common audio curation workflows.
Learn the basics of audio loading, ASR inference, and quality filtering
asr-inference
quality-filtering
basic-workflow