Audio Curation Concepts
This guide covers the essential concepts for audio data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with speech processing and machine learning principles.
Core Concept Areas
Audio curation in NVIDIA NeMo Curator focuses on these key areas:
Modality-level overview of ingest, validation, optional ASR, metrics, filtering, and export
Understanding the AudioBatch data structure and audio file management
Comprehensive overview of the automatic speech recognition pipeline and workflow
Core concepts for evaluating speech transcription quality and audio characteristics
Concepts for constructing manifests and ingesting audio datasets
Concepts for integrating audio processing with text curation workflows
Infrastructure Components
The audio curation concepts build on NVIDIA NeMo Curator’s core infrastructure components, which are shared across all modalities. These components include: