For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Output Formats
  • JSONL Manifests
  • Metadata Fields
  • Export Configuration
  • Directory Structure
  • Standard Output Layout
  • Quality Control
  • Validation Checks
Curate Audio

Save & Export Audio Data

||View as Markdown|
Previous

Text Integration

Next

Overview

Export processed audio data and transcriptions in formats optimized for ASR model training, audio-and-text applications, and downstream analysis workflows.

Output Formats

NeMo Curator’s audio curation pipeline supports several output formats tailored for different use cases:

JSONL Manifests

The primary output format for audio curation is JSONL (JSON Lines):

1{"audio_filepath": "/data/audio/sample_001.wav", "text": "hello world", "pred_text": "hello world", "wer": 0.0, "duration": 2.1}
2{"audio_filepath": "/data/audio/sample_002.wav", "text": "good morning", "pred_text": "good morning", "wer": 0.0, "duration": 1.8}

Metadata Fields

Standard fields included in audio manifests:

FieldTypeDescription
audio_filepathstringAbsolute path to audio file
textstringGround truth transcription
pred_textstringASR model prediction
werfloatWord Error Rate percentage
durationfloatAudio duration in seconds
languagestringLanguage identifier (optional)

Export Configuration

Using JsonlWriter
1from nemo_curator.stages.text.io.writer import JsonlWriter
2from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
3
4# Convert AudioTask to DocumentBatch for text writer
5pipeline.add_stage(AudioToDocumentStage())
6
7# Configure JSONL export
8pipeline.add_stage(
9 JsonlWriter(
10 path="/output/audio_manifests",
11 write_kwargs={"force_ascii": False} # Support Unicode characters
12 )
13)

Directory Structure

Standard Output Layout

When source_files metadata exists, the writer generates deterministic hashed file names. Otherwise, it generates UUID-based names.

/output/audio_manifests/
├── <hash>.jsonl # Deterministic hash if metadata.source_files present, else UUID
├── <hash>.jsonl
└── ...

Quality Control

Validation Checks

Before export, check your processed data:

1from nemo_curator.stages.audio.common import PreserveByValueStage
2
3# Filter by quality thresholds
4quality_filters = [
5 # Keep samples with WER &lt;= 50%
6 PreserveByValueStage(
7 input_value_key="wer",
8 target_value=50.0,
9 operator="le"
10 ),
11 # Keep samples with duration 1-30 seconds
12 PreserveByValueStage(
13 input_value_key="duration",
14 target_value=1.0,
15 operator="ge"
16 ),
17 PreserveByValueStage(
18 input_value_key="duration",
19 target_value=30.0,
20 operator="le"
21 )
22]
23
24for filter_stage in quality_filters:
25 pipeline.add_stage(filter_stage)