For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogoNeMo Curator
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
      • Overview
      • Beginner Tutorial
      • ALM Tutorial
      • ReadSpeech Tutorial
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
  • Welcome
  • Overview
  • Key Features
  • Overview
  • Deduplication
  • Resource Allocation
  • Streaming
  • Auto-Balancing
  • Throughput
  • Overview
  • Loading
  • Acquisition
  • Processing
  • Curation Pipeline
  • Overview
  • Loading
  • Data Processing
  • Data Export
  • Overview
  • Architecture
  • Abstractions
  • Data Flow
  • Overview
  • Curation Pipeline
  • Audio Task
  • ASR Pipeline
  • Quality Metrics
  • Manifests and Ingest
  • ALM Pipeline
  • Text Integration
  • Overview
  • Migration FAQ
  • Migration Guide
  • Overview
  • Install (All Modalities)
  • Text Quickstart
  • Image Quickstart
  • Video Quickstart
  • Audio Quickstart
  • Overview
  • Tutorials
  • Overview
  • ArXiv
  • Common Crawl
  • Custom Sources
  • Nemotron-Parse PDF Pipeline
  • Read Existing Data
  • Wikipedia
  • Overview
  • Overview
  • Add IDs
  • Text Cleaning
  • Overview
  • vLLM Embedder
  • Overview
  • Exact Deduplication
  • Fuzzy Deduplication
  • Semantic Deduplication
  • Overview
  • Language Detection
  • Stopwords
  • Overview
  • Classifier
  • Distributed Classifier
  • Heuristic Filtering
  • Overview
  • Code Processing
  • Overview
  • Interleaved IO
  • Interleaved Filters
  • Save and Export
  • Overview
  • LLM Client Setup
  • Inference Server
  • NeMo Data Designer
  • Multilingual Q&A
  • Overview
  • Task Reference
  • Overview
  • Overview
  • Beginner Tutorial
  • Deduplication Workflow
  • Overview
  • TAR Archives
  • Overview
  • Overview
  • CLIP Embedder
  • Overview
  • Aesthetic Filter
  • NSFW Filter
  • Save and Export
  • Overview
  • Overview
  • Beginner Tutorial
  • Split and Dedup
  • Overview
  • Add Custom Environment
  • Add Custom Code
  • Add Custom Model
  • Add Custom Stage
  • Load Data
  • Overview
  • Clipping
  • Transcoding
  • Filtering
  • Embeddings
  • Deduplication
  • Frame Extraction
  • Captions Preview
  • Save and Export
  • Overview
  • Overview
  • Beginner Tutorial
  • ALM Tutorial
  • ReadSpeech Tutorial
  • Overview
  • Custom Manifests
  • FLEURS Dataset
  • Local Files
  • Overview
  • Overview
  • NeMo ASR Models
  • Overview
  • WER Filtering
  • Duration Filtering
  • Overview
  • Preprocessing Stages
  • VAD Segmentation
  • Band Filter
  • UTMOS Filter
  • SIGMOS Filter
  • Speaker Separation
  • AudioDataFilterStage Composite
  • Overview
  • Duration Calculation
  • Format Validation
  • Overview
  • Data Builder
  • Overlap Filtering
  • Text Integration
  • Save and Export
  • Overview
  • Overview
  • Requirements
  • Deploy Image Curation on Slurm
  • Multi-Node Ray on Slurm
  • Overview
  • Overview
  • Overview
  • Memory Management
  • Monitoring
  • GPU Processing
  • Resumable Processing
  • Execution Backends
  • Per-Stage Runtime Environments
  • Container Environments
  • Related Tools
Curate AudioTutorials

Audio Curation Tutorials

||View as Markdown|

Use the tutorials in this section to learn audio curation with NeMo Curator.

Tutorials are organized by complexity and typically build on one another.


Beginner Tutorial

Run your first audio processing pipeline using the FLEURS dataset, including ASR inference and basic quality filtering. fleurs-dataset asr-inference wer-filtering

ALM Tutorial

Curate training data for audio language models by extracting fixed-duration windows from diarized audio segments. alm windowing speaker-diarization

Previous

Overview

Next

Beginner Tutorial

NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.