For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
      • Overview
      • Custom Manifests
      • FLEURS Dataset
      • Local Files
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • How it Works
  • Loading Methods
Curate AudioLoad Data

Load Audio Data

||View as Markdown|

Import audio datasets from various sources into NeMo Curator’s audio processing pipeline. Audio data loading supports manifest files, direct file paths, and automated dataset downloads.

How it Works

Audio data loading in NeMo Curator centers around the AudioTask data structure, which contains:

  • Audio file paths: References to audio files (.wav, .mp3, .flac, and so on)
  • Transcriptions: Ground truth or reference text for speech content
  • Metadata: Duration, language, speaker information, and quality metrics

The loading process validates audio file existence and formats data for downstream ASR inference and quality assessment stages.


Loading Methods

Choose the appropriate loading method based on your data source and format:

FLEURS Dataset

Automated download and processing of the multilingual FLEURS speech dataset automated multilingual 102-languages

Custom Manifests

Create and load custom audio manifests with file paths and transcriptions jsonl tsv custom-format

Local Files

Load audio files directly from local directories and file systems local-storage batch-processing file-discovery

Previous

ReadSpeech Tutorial

Next

Custom Manifests