Audio Analysis
Extract and analyze audio file characteristics for quality control, metadata generation, and dataset validation. Audio analysis provides essential information about audio files before and during processing.
How It Works
Audio analysis in NeMo Curator examines audio files to extract:
- Duration Information: Precise timing measurements using
soundfile - Format Characteristics: Sample rate, bit depth, channels, and format
- Quality Indicators: File integrity, format compliance, technical quality
- Metadata Extraction: Embedded metadata and file properties
NeMo Curator provides duration extraction as a built-in stage (GetAudioDurationStage). The format and metadata examples below show how to build custom stages and are not built-in.
Input Requirements
Each audio data entry must include the path to the file:
Use audio_filepath_key to customize the key name when constructing GetAudioDurationStage.
Duration Analysis
Precise Duration Calculation
The duration calculation:
- Uses the
soundfilelibrary; computes duration as frames ÷ sample rate - Handles formats supported by
soundfile(libsndfile) - Returns -1.0 for corrupted or unreadable files
- Calculates:
duration = sample_count / sample_rate
Duration-Based Quality Assessment
After calculating durations, you can analyze the results:
Duration Filtering Example
Refer to Duration Filtering for end-to-end examples.
Format Validation
NeMo Curator infers basic format validity during duration extraction using soundfile.read. If soundfile/libsndfile cannot read a file, GetAudioDurationStage sets duration = -1.0, which you can filter out. Refer to Format Validation for behavior and supported formats.
Basic Format Check
Complete Analysis Pipeline
Here is a complete working pipeline for audio analysis: