> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Essential concepts for audio data curation including ASR inference, quality assessment, and speech processing workflows

# Audio Curation Concepts

This guide covers the essential concepts for audio data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with speech processing and machine learning principles.

## Core Concept Areas

Audio curation in NVIDIA NeMo Curator focuses on these key areas:

Modality-level overview of ingest, validation, optional ASR, metrics, filtering, and export
overview map

Understanding the AudioTask data structure and audio file management
data-structures validation

Comprehensive overview of the automatic speech recognition pipeline and workflow
overview architecture

Core concepts for evaluating speech transcription quality and audio characteristics
wer cer metrics

Concepts for constructing manifests and ingesting audio datasets
manifests ingest

Audio Language Model data curation pipeline for extracting training windows from diarized segments
alm windowing speaker-diarization

Concepts for integrating audio processing with text curation workflows
multimodal integration

## Infrastructure Components

The audio curation concepts build on NVIDIA NeMo Curator's core infrastructure components, which are shared across all modalities. These components include:

Optimize memory usage when processing large audio datasets
partitioning
batching
monitoring

Leverage NVIDIA GPUs for faster ASR inference and audio processing
cuda
nemo-toolkit
performance

Continue interrupted operations across large audio datasets
checkpoints
recovery
batching