Audio Format Support
NeMo Curator audio processing stages use the soundfile library for audio file handling. Built-in error handling surfaces unreadable or unsupported files during duration calculation.
Supported Formats
Audio stages support formats compatible with the soundfile library (backed by libsndfile):
- WAV: Uncompressed audio (recommended for high quality)
- FLAC: Lossless compression with metadata support
- OGG: Open-source compressed format
- MP3: Compressed format (availability depends on your system’s
libsndfilebuild) - AIFF: Apple uncompressed format
Note: AAC/M4A is not supported by default by soundfile/libsndfile. Prefer WAV or FLAC for consistent cross-platform behavior.
Built-in Error Handling
Duration Calculation with Error Handling
The GetAudioDurationStage automatically handles corrupted or unreadable files:
Error Handling Behavior
When soundfile/libsndfile cannot read audio files:
- Duration Calculation: Returns -1.0 for corrupted/unreadable files
- ASR Inference: Will fail with clear error messages for unsupported formats
- File Validation: Use duration = -1.0 as an indicator of file issues
Working Example
Here is a complete pipeline that handles format validation through built-in error handling:
Format Support Check
To check supported formats on your system:
This approach leverages the built-in error handling of NeMo Curator’s audio stages rather than requiring extra format validation steps.