For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Prerequisites
  • Installation Options
  • Download Sample Configuration
  • Set Up Data Directory
  • Basic Audio Curation Example
  • Alternative: Configuration-Based Approach
  • Expected Output
  • Next Steps
Get Started

Get Started with Audio Curation

||View as Markdown|
Previous

Video Quickstart

Next

Overview

This guide helps you set up and get started with NeMo Curator’s audio curation capabilities. Follow these steps to prepare your environment and run your first audio curation pipeline using the FLEURS dataset.

Prerequisites

To use NeMo Curator’s audio curation modules, ensure you meet the following requirements:

  • Python 3.10, 3.11, or 3.12
    • packaging >= 22.0
  • uv (for package management and installation)
  • Ubuntu 22.04/20.04
  • NVIDIA GPU (recommended for ASR inference)
    • Volta™ or higher (compute capability 7.0+)
    • CUDA 12 (or above)
  • Audio processing libraries (automatically installed with audio extras)

If you don’t have uv installed, refer to the Installation Guide for setup instructions, or install it quickly with:

$curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
$source $HOME/.local/bin/env

Installation Options

You can install NeMo Curator with audio support in four ways:

PyPI Installation
Source Installation
NeMo Curator Container

The simplest way to install NeMo Curator with audio support:

$echo "transformers==4.55.2" > override.txt
$uv pip install "nemo-curator[audio_cuda12]" --override override.txt

The audio extras include NeMo Toolkit with ASR models. Additional audio processing libraries (soundfile, editdistance) are installed automatically as NeMo Toolkit dependencies.

Download Sample Configuration

NeMo Curator provides a sample FLEURS configuration for audio curation. You can download and customize it:

$mkdir -p ~/nemo_curator/configs
$wget -O ~/nemo_curator/configs/fleurs_pipeline.yaml https://raw.githubusercontent.com/NVIDIA/NeMo-Curator/main/tutorials/audio/fleurs/pipeline.yaml

This configuration file contains a complete audio curation pipeline for the FLEURS dataset, including ASR inference, quality assessment, and filtering.

Set Up Data Directory

Create a directory to store your audio datasets:

$mkdir -p ~/nemo_curator/audio_data

Basic Audio Curation Example

Here’s a simple example to get started with audio curation using the FLEURS dataset:

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.datasets.fleurs.create_initial_manifest import CreateInitialManifestFleursStage
3from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
4from nemo_curator.stages.audio.metrics.get_wer import GetPairwiseWerStage
5from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
6from nemo_curator.stages.resources import Resources
7
8# Create audio curation pipeline
9pipeline = Pipeline(name="audio_curation", description="FLEURS audio curation with ASR and WER filtering")
10
11# 1. Load FLEURS dataset (Armenian development set)
12pipeline.add_stage(
13 CreateInitialManifestFleursStage(
14 lang="hy_am",
15 split="dev",
16 raw_data_dir="~/nemo_curator/audio_data"
17 ).with_(batch_size=4)
18)
19
20# 2. Perform ASR inference using NeMo model
21pipeline.add_stage(
22 InferenceAsrNemoStage(
23 model_name="nvidia/stt_hy_fastconformer_hybrid_large_pc"
24 ).with_(resources=Resources(gpus=1.0))
25)
26
27# 3. Calculate Word Error Rate (WER)
28pipeline.add_stage(
29 GetPairwiseWerStage(
30 text_key="text",
31 pred_text_key="pred_text",
32 wer_key="wer"
33 )
34)
35
36# 4. Calculate audio duration
37pipeline.add_stage(
38 GetAudioDurationStage(
39 audio_filepath_key="audio_filepath",
40 duration_key="duration"
41 )
42)
43
44# 5. Filter by WER threshold (keep samples with WER <= 75%)
45pipeline.add_stage(
46 PreserveByValueStage(
47 input_value_key="wer",
48 target_value=75.0,
49 operator="le" # less than or equal
50 )
51)
52
53# Execute the pipeline
54pipeline.run()

Alternative: Configuration-Based Approach

You can also run the pipeline using the downloaded configuration:

$cd ~/nemo_curator
$python tutorials/audio/fleurs/run.py \
> --config-path ~/nemo_curator/configs \
> --config-name fleurs_pipeline.yaml \
> raw_data_dir=~/nemo_curator/audio_data

Expected Output

After running the pipeline, you’ll have:

~/nemo_curator/audio_data/
├── hy_am/ # Armenian language data
│ ├── dev.tsv # Transcription metadata
│ ├── dev.tar.gz # Audio archive
│ ├── dev/ # Extracted audio files
│ └── result/ # Filtered results
│ └── *.jsonl # High-quality audio-text pairs

Each output entry contains:

1{
2 "audio_filepath": "/absolute/path/to/audio.wav",
3 "text": "ground truth transcription",
4 "pred_text": "asr model prediction",
5 "wer": 12.5,
6 "duration": 4.2
7}

Next Steps

Explore the Audio Curation documentation for more advanced processing techniques and customization options.

Key areas to explore next:

  • Custom Audio Manifests - Load your own audio datasets
  • Quality Assessment - Advanced filtering and quality metrics
  • Text Integration - Combine with text processing workflows