For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
      • Overview
      • Deduplication
        • Overview
        • Curation Pipeline
        • Audio Batch
        • ASR Pipeline
        • Quality Metrics
        • Manifests and Ingest
        • Text Integration
  • Get Started
    • Overview
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
    • Installation
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Manifest Structure
  • Data Ingestion and Validation
  • Field Recommendations
  • Implementation Examples
  • Pipeline Integration
About NeMo CuratorConceptsAudio Concepts

Dataset Manifests and Ingest

||View as Markdown|
Previous

Audio Quality Metrics

Next

Audio-Text Integration Concepts

This guide covers the core concepts for ingesting audio data into NeMo Curator using consistent manifests and validation workflows.

Manifest Structure

Audio manifests in NeMo Curator follow a standardized format for consistent data processing:

Required Fields:

  • audio_filepath: Path to the audio file (absolute or relative)

Common Optional Fields:

  • text: Ground truth transcription or existing transcription
  • duration: Audio length in seconds
  • language: Language code (such as “en”, “es”, “fr”)
  • speaker_id: Speaker identifier for multi-speaker datasets
  • Custom metadata fields for domain-specific information

Creation Methods:

  • Programmatic Generation: Use dataset-specific stages like CreateInitialManifestFleursStage
  • Custom Scripts: Generate JSONL files with consistent field naming
  • Manual Creation: Create JSONL manifests for small datasets or specialized use cases

Data Ingestion and Validation

NeMo Curator provides robust validation mechanisms for audio data ingestion:

File Existence Validation:

  • AudioBatch automatically validates file paths during creation
  • Use validate() for batch-level validation
  • Use validate_item() for individual file validation
  • Missing files generate warnings but do not stop processing

Validation Strategy:

  • Check file existence at the start of the pipeline
  • Add metadata fields (duration, format) in downstream processing stages
  • Use non-blocking validation to maintain processing throughput

Field Recommendations

Essential for All Workflows:

  • audio_filepath: File path validation and processing

Recommended for ASR Workflows:

  • text: Ground truth for WER calculation and quality assessment
  • language: Language-specific model selection and validation

Recommended for Quality Assessment:

  • duration: Duration-based filtering and speech rate analysis
  • speaker_id: Speaker consistency and diversity analysis

Domain-Specific Fields:

  • Recording quality indicators (studio, phone, outdoor)
  • Content type tags (conversational, broadcast, lecture)
  • Noise level indicators for quality assessment

Implementation Examples

Basic Manifest Creation:

1import json
2
3# Create simple manifest
4manifest_data = [
5 {
6 "audio_filepath": "/path/to/audio1.wav",
7 "text": "Hello world",
8 "duration": 1.5,
9 "language": "en"
10 },
11 {
12 "audio_filepath": "/path/to/audio2.wav",
13 "text": "Good morning",
14 "duration": 2.1,
15 "language": "en"
16 }
17]
18
19# Save as JSONL
20with open("manifest.jsonl", "w") as f:
21 for item in manifest_data:
22 f.write(json.dumps(item) + "\n")

AudioBatch Validation:

1from nemo_curator.tasks import AudioBatch
2
3# Create AudioBatch with validation
4audio_batch = AudioBatch(
5 data=manifest_data,
6 filepath_key="audio_filepath"
7)
8
9# Validate file existence
10is_valid = audio_batch.validate()
11print(f"Batch validation: {is_valid}")

Pipeline Integration

ASR Workflow Preparation:

  • Ensure audio_filepath points to valid audio files
  • ASR stages automatically add pred_text field with predictions
  • Include text field for WER calculation and quality assessment

Quality Assessment Preparation:

  • Use GetAudioDurationStage to add duration information
  • Include existing transcriptions for WER-based filtering
  • Add metadata fields for comprehensive quality analysis

Format Conversion Readiness:

  • Standardize field names across different data sources
  • Ensure consistent audio file formats and sample rates
  • Validate encoding and accessibility of all audio files