Curate AudioProcess DataAudio Analysis

Audio Analysis

View as Markdown

Extract and analyze audio file characteristics for quality control, metadata generation, and dataset validation. Audio analysis provides essential information about audio files before and during processing.

How It Works

Audio analysis in NeMo Curator examines audio files to extract:

  1. Duration Information: Precise timing measurements using soundfile
  2. Format Characteristics: Sample rate, bit depth, channels, and format
  3. Quality Indicators: File integrity, format compliance, technical quality
  4. Metadata Extraction: Embedded metadata and file properties

NeMo Curator provides duration extraction as a built-in stage (GetAudioDurationStage). The format and metadata examples below show how to build custom stages and are not built-in.

Input Requirements

Each audio data entry must include the path to the file:

1# Required key in each data item
2{
3 "audio_filepath": "/path/to/audio.wav"
4}

Use audio_filepath_key to customize the key name when constructing GetAudioDurationStage.

Duration Analysis

Precise Duration Calculation

1from nemo_curator.stages.audio.common import GetAudioDurationStage
2
3# Calculate audio duration for each file
4duration_stage = GetAudioDurationStage(
5 audio_filepath_key="audio_filepath",
6 duration_key="duration"
7)

The duration calculation:

  • Uses the soundfile library; computes duration as frames ÷ sample rate
  • Handles formats supported by soundfile (libsndfile)
  • Returns -1.0 for corrupted or unreadable files
  • Calculates: duration = sample_count / sample_rate

Duration-Based Quality Assessment

After calculating durations, you can analyze the results:

Duration Filtering Example

1from nemo_curator.stages.audio.common import PreserveByValueStage
2
3# Keep samples between 1 and 15 seconds
4min_duration_filter = PreserveByValueStage(
5 input_value_key="duration",
6 target_value=1.0,
7 operator="ge"
8)
9max_duration_filter = PreserveByValueStage(
10 input_value_key="duration",
11 target_value=15.0,
12 operator="le"
13)

Refer to Duration Filtering for end-to-end examples.

Format Validation

NeMo Curator infers basic format validity during duration extraction using soundfile.read. If soundfile/libsndfile cannot read a file, GetAudioDurationStage sets duration = -1.0, which you can filter out. Refer to Format Validation for behavior and supported formats.

Basic Format Check

1import soundfile as sf
2
3# Check if file is readable
4try:
5 info = sf.info("audio_file.wav")
6 print(f"Duration: {info.duration}s, Sample rate: {info.samplerate}Hz")
7except Exception as e:
8 print(f"File validation failed: {e}")

Complete Analysis Pipeline

Here is a complete working pipeline for audio analysis:

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
3from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
4
5# Create analysis pipeline
6pipeline = Pipeline(name="audio_analysis")
7
8# 1. Calculate duration (handles format validation automatically)
9pipeline.add_stage(GetAudioDurationStage(
10 audio_filepath_key="audio_filepath",
11 duration_key="duration"
12))
13
14# 2. Filter by duration (removes corrupted files with duration = -1.0)
15pipeline.add_stage(PreserveByValueStage(
16 input_value_key="duration",
17 target_value=1.0,
18 operator="ge" # >= 1 second
19))
20
21pipeline.add_stage(PreserveByValueStage(
22 input_value_key="duration",
23 target_value=15.0,
24 operator="le" # <= 15 seconds
25))
26
27# 3. Continue with ASR inference on validated files
28pipeline.add_stage(InferenceAsrNemoStage(
29 model_name="nvidia/stt_en_fastconformer_hybrid_large_pc"
30))