Extract and analyze audio file characteristics for quality control, metadata generation, and dataset validation. Audio analysis provides essential information about audio files before and during processing.
Audio analysis in NeMo Curator examines audio files to extract:
soundfileNeMo Curator provides duration extraction as a built-in stage (GetAudioDurationStage). The format and metadata examples below show how to build custom stages and are not built-in.
Each audio data entry must include the path to the file:
Use audio_filepath_key to customize the key name when constructing GetAudioDurationStage.
The duration calculation:
soundfile library; computes duration as frames ÷ sample ratesoundfile (libsndfile)duration = sample_count / sample_rateAfter calculating durations, you can analyze the results:
Refer to Duration Filtering for end-to-end examples.
NeMo Curator infers basic format validity during duration extraction using soundfile.read. If soundfile/libsndfile cannot read a file, GetAudioDurationStage sets duration = -1.0, which you can filter out. Refer to Format Validation for behavior and supported formats.
Here is a complete working pipeline for audio analysis: