This guide covers the core concepts for ingesting audio data into NeMo Curator using consistent manifests and validation workflows.
Audio manifests in NeMo Curator follow a standardized format for consistent data processing:
Required Fields:
audio_filepath: Path to the audio file (absolute or relative)Common Optional Fields:
text: Ground truth transcription or existing transcriptionduration: Audio length in secondslanguage: Language code (such as “en”, “es”, “fr”)speaker_id: Speaker identifier for multi-speaker datasetsCreation Methods:
CreateInitialManifestFleursStageNeMo Curator provides robust validation mechanisms for audio data ingestion:
File Existence Validation:
AudioTask automatically validates file paths during creationvalidate() to check whether the audio file for this task exists on diskvalidate_item() for individual file validationValidation Strategy:
Essential for All Workflows:
audio_filepath: File path validation and processingRecommended for ASR Workflows:
text: Ground truth for WER calculation and quality assessmentlanguage: Language-specific model selection and validationRecommended for Quality Assessment:
duration: Duration-based filtering and speech rate analysisspeaker_id: Speaker consistency and diversity analysisDomain-Specific Fields:
Basic Manifest Creation:
AudioTask Validation:
ASR Workflow Preparation:
audio_filepath points to valid audio filespred_text field with predictionstext field for WER calculation and quality assessmentQuality Assessment Preparation:
GetAudioDurationStage to add duration informationFormat Conversion Readiness: