For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
      • Overview
      • Deduplication
        • Overview
        • Curation Pipeline
        • Audio Task
        • ASR Pipeline
        • Quality Metrics
        • Manifests and Ingest
        • ALM Pipeline
        • Text Integration
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Core Concept Areas
  • Infrastructure Components
About NeMo CuratorConceptsAudio Concepts

Audio Curation Concepts

||View as Markdown|
Previous

Data Flow

Next

Curation Pipeline

This guide covers the essential concepts for audio data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with speech processing and machine learning principles.

Core Concept Areas

Audio curation in NVIDIA NeMo Curator focuses on these key areas:

Audio Curation Pipeline

Modality-level overview of ingest, validation, optional ASR, metrics, filtering, and export overview map

AudioTask Structure

Understanding the AudioTask data structure and audio file management data-structures validation

ASR Pipeline

Comprehensive overview of the automatic speech recognition pipeline and workflow overview architecture

Quality Metrics

Core concepts for evaluating speech transcription quality and audio characteristics wer cer metrics

Dataset Manifests and Ingest

Concepts for constructing manifests and ingesting audio datasets manifests ingest

ALM Pipeline

Audio Language Model data curation pipeline for extracting training windows from diarized segments alm windowing speaker-diarization

Text Integration

Concepts for integrating audio processing with text curation workflows multimodal integration

Infrastructure Components

The audio curation concepts build on NVIDIA NeMo Curator’s core infrastructure components, which are shared across all modalities. These components include:

Memory Management

Optimize memory usage when processing large audio datasets partitioning batching monitoring

GPU Acceleration

Leverage NVIDIA GPUs for faster ASR inference and audio processing cuda nemo-toolkit performance

Resumable Processing

Continue interrupted operations across large audio datasets checkpoints recovery batching