Get Started with Audio Curation
This guide helps you set up and get started with NeMo Curator’s audio curation capabilities. Follow these steps to prepare your environment and run your first audio curation pipeline using the FLEURS dataset.
Prerequisites
To use NeMo Curator’s audio curation modules, ensure you meet the following requirements:
- Python 3.10, 3.11, or 3.12
- packaging >= 22.0
- uv (for package management and installation)
- Ubuntu 22.04/20.04
- NVIDIA GPU (recommended for ASR inference)
- Volta™ or higher (compute capability 7.0+)
- CUDA 12 (or above)
- Audio processing libraries (automatically installed with audio extras)
If you don’t have uv installed, refer to the Installation Guide for setup instructions, or install it quickly with:
Installation Options
You can install NeMo Curator with audio support in four ways:
PyPI Installation
Source Installation
NeMo Curator Container
The simplest way to install NeMo Curator with audio support:
The audio extras include NeMo Toolkit with ASR models. Additional audio processing libraries (soundfile, editdistance) are installed automatically as NeMo Toolkit dependencies.
Download Sample Configuration
NeMo Curator provides a sample FLEURS configuration for audio curation. You can download and customize it:
This configuration file contains a complete audio curation pipeline for the FLEURS dataset, including ASR inference, quality assessment, and filtering.
Set Up Data Directory
Create a directory to store your audio datasets:
Basic Audio Curation Example
Here’s a simple example to get started with audio curation using the FLEURS dataset:
Alternative: Configuration-Based Approach
You can also run the pipeline using the downloaded configuration:
Expected Output
After running the pipeline, you’ll have:
Each output entry contains:
Next Steps
Explore the Audio Curation documentation for more advanced processing techniques and customization options.
Key areas to explore next:
- Custom Audio Manifests - Load your own audio datasets
- Quality Assessment - Advanced filtering and quality metrics
- Text Integration - Combine with text processing workflows