Load Audio Data#

Import audio datasets from various sources into NeMo Curator’s audio processing pipeline. Audio data loading supports manifest files, direct file paths, and automated dataset downloads.

How it Works#

Audio data loading in NeMo Curator centers around the AudioBatch data structure, which contains:

Audio file paths: References to audio files (.wav, .mp3, .flac, etc.)
Transcriptions: Ground truth or reference text for speech content
Metadata: Duration, language, speaker information, and quality metrics

The loading process validates audio file existence and formats data for downstream ASR inference and quality assessment stages.

Loading Methods#

Choose the appropriate loading method based on your data source and format:

FLEURS Dataset

Automated download and processing of the multilingual FLEURS speech dataset

automated multilingual 102-languages

Load FLEURS Dataset

Custom Manifests

Create and load custom audio manifests with file paths and transcriptions

jsonl tsv custom-format

Create and Load Custom Audio Manifests

Local Files

Load audio files directly from local directories and file systems

local-storage batch-processing file-discovery

Load Local Audio Files