Save & Export Audio Data
Export processed audio data and transcriptions in formats optimized for ASR model training, audio-and-text applications, and downstream analysis workflows.
Output Formats
NeMo Curator’s audio curation pipeline supports several output formats tailored for different use cases:
JSONL Manifests
The primary output format for audio curation is JSONL (JSON Lines):
Metadata Fields
Standard fields included in audio manifests:
Export Configuration
Using JsonlWriter
Directory Structure
Standard Output Layout
When source_files metadata exists, the writer generates deterministic hashed file names. Otherwise, it generates UUID-based names.
Quality Control
Validation Checks
Before export, check your processed data: