Load Audio Data#

Import audio datasets from various sources into NeMo Curator’s audio processing pipeline. Audio data loading supports manifest files, direct file paths, and automated dataset downloads.

How it Works#

Audio data loading in NeMo Curator centers around the AudioBatch data structure, which contains:

  • Audio file paths: References to audio files (.wav, .mp3, .flac, etc.)

  • Transcriptions: Ground truth or reference text for speech content

  • Metadata: Duration, language, speaker information, and quality metrics

The loading process validates audio file existence and formats data for downstream ASR inference and quality assessment stages.


Loading Methods#

Choose the appropriate loading method based on your data source and format:

FLEURS Dataset

Automated download and processing of the multilingual FLEURS speech dataset

Load FLEURS Dataset
Custom Manifests

Create and load custom audio manifests with file paths and transcriptions

Create and Load Custom Audio Manifests
Local Files

Load audio files directly from local directories and file systems

Load Local Audio Files