Curate AudioProcess DataAudio Analysis

Duration Calculation

View as Markdown

Calculate precise audio duration using the soundfile library for quality assessment and metadata generation in audio curation pipelines.

Overview

The GetAudioDurationStage extracts precise timing information from audio files using the soundfile library. This information is essential for quality filtering, dataset analysis, and ensuring consistent audio lengths for training.

Key Features

  • High Precision: Uses soundfile for frame-accurate duration calculation
  • Format Support: Works with all audio formats supported by soundfile (WAV, FLAC, OGG, and so on)
  • Error Handling: Returns -1.0 for corrupted or unreadable files
  • Pipeline Integration: Designed for use in NeMo Curator processing pipelines

How It Works

The duration calculation stage reads audio samples and sample rate to determine exact duration:

1from nemo_curator.stages.audio.common import GetAudioDurationStage
2from nemo_curator.tasks import AudioBatch
3
4# Initialize duration calculator
5duration_stage = GetAudioDurationStage(
6 audio_filepath_key="audio_filepath",
7 duration_key="duration"
8)
9
10# Process audio data
11audio_data = {"audio_filepath": "/path/to/audio.wav", "text": "transcription"}
12audio_batch = AudioBatch(data=[audio_data])
13result_batch = duration_stage.process(audio_batch)
14
15# Access duration information
16duration = result_batch[0].data[0]["duration"]
17print(f"Audio duration: {duration:.3f} seconds")

Duration Calculation Process

  1. File Reading: Uses soundfile to read audio samples and sample rate
  2. Frame Counting: Counts total audio frames from the loaded samples
  3. Duration Calculation: Computes duration as frames ÷ sample_rate
  4. Error Handling: Sets duration to -1.0 for corrupted files

Configuration

Basic Configuration

1from nemo_curator.stages.audio.common import GetAudioDurationStage
2
3# Configure duration calculation
4duration_stage = GetAudioDurationStage(
5 audio_filepath_key="audio_filepath", # Field containing audio file paths
6 duration_key="duration" # Output field for duration values
7)

Custom Field Names

1# Use custom field names for your data format
2duration_stage = GetAudioDurationStage(
3 audio_filepath_key="wav_file_path", # Custom input field
4 duration_key="audio_length_seconds" # Custom output field
5)

Usage Examples

Basic Duration Calculation

1from nemo_curator.stages.audio.common import GetAudioDurationStage
2from nemo_curator.tasks import AudioBatch
3
4# Sample audio data
5audio_samples = [
6 {"audio_filepath": "/path/to/sample1.wav", "text": "Hello world"},
7 {"audio_filepath": "/path/to/sample2.wav", "text": "How are you"},
8 {"audio_filepath": "/path/to/sample3.wav", "text": "Good morning"}
9]
10
11# Create duration calculation stage
12duration_stage = GetAudioDurationStage(
13 audio_filepath_key="audio_filepath",
14 duration_key="duration"
15)
16
17# Process each sample
18for sample in audio_samples:
19 audio_batch = AudioBatch(data=[sample])
20 result_batch = duration_stage.process(audio_batch)
21
22 processed_sample = result_batch[0].data[0]
23 print(f"File: {processed_sample['audio_filepath']}")
24 print(f"Duration: {processed_sample['duration']:.3f} seconds")

Pipeline Integration

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
3
4# Create audio processing pipeline
5pipeline = Pipeline(name="audio_duration_pipeline")
6
7# Add duration calculation stage
8pipeline.add_stage(GetAudioDurationStage(
9 audio_filepath_key="audio_filepath",
10 duration_key="duration"
11))
12
13# Add duration-based filtering (1-30 seconds)
14pipeline.add_stage(PreserveByValueStage(
15 input_value_key="duration",
16 target_value=1.0,
17 operator="ge" # greater than or equal
18))
19
20pipeline.add_stage(PreserveByValueStage(
21 input_value_key="duration",
22 target_value=30.0,
23 operator="le" # less than or equal
24))

Batch Processing

1from nemo_curator.stages.audio.common import GetAudioDurationStage
2from nemo_curator.tasks import AudioBatch
3
4# Process multiple samples in a batch
5audio_data_list = [
6 {"audio_filepath": "/path/to/file1.wav", "text": "Sample 1"},
7 {"audio_filepath": "/path/to/file2.wav", "text": "Sample 2"},
8 {"audio_filepath": "/path/to/file3.wav", "text": "Sample 3"}
9]
10
11# Create batch
12audio_batch = AudioBatch(data=audio_data_list)
13
14# Process entire batch
15duration_stage = GetAudioDurationStage(
16 audio_filepath_key="audio_filepath",
17 duration_key="duration"
18)
19
20# Process returns list of AudioBatch objects
21result_batches = duration_stage.process(audio_batch)
22
23# Extract processed data
24for batch in result_batches:
25 for sample in batch.data:
26 print(f"File: {sample['audio_filepath']}")
27 print(f"Duration: {sample['duration']:.3f} seconds")

Output Format

The stage adds duration information to each audio sample’s metadata:

1{
2 "audio_filepath": "/path/to/audio.wav",
3 "text": "Sample transcription text",
4 "duration": 12.345
5}

For corrupted or unreadable files:

1{
2 "audio_filepath": "/path/to/corrupted.wav",
3 "text": "Sample transcription text",
4 "duration": -1.0
5}

Error Handling

The stage handles various error conditions:

File Not Found

1# Non-existent files result in duration = -1.0
2sample = {"audio_filepath": "/nonexistent/file.wav", "text": "test"}
3audio_batch = AudioBatch(data=[sample])
4result = duration_stage.process(audio_batch)
5# result[0].data[0]["duration"] == -1.0

Corrupted Audio Files

1# Corrupted files are logged and marked with duration = -1.0
2# Check logs for specific error messages
3import logging
4logging.basicConfig(level=logging.WARNING)
5
6# Process will continue with other files
7result = duration_stage.process(audio_batch)

Filtering Error Files

1from nemo_curator.stages.audio.common import PreserveByValueStage
2
3# Filter out files with calculation errors
4error_filter = PreserveByValueStage(
5 input_value_key="duration",
6 target_value=0.0,
7 operator="gt" # greater than (excludes -1.0 error values)
8)

Integration with Quality Assessment

Duration calculation is typically the first step in quality assessment workflows:

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
3
4# Create comprehensive quality pipeline
5pipeline = Pipeline(name="audio_quality_assessment")
6
7# Step 1: Calculate durations
8pipeline.add_stage(GetAudioDurationStage(
9 audio_filepath_key="audio_filepath",
10 duration_key="duration"
11))
12
13# Step 2: Filter by duration range (optimal for ASR training)
14pipeline.add_stage(PreserveByValueStage(
15 input_value_key="duration",
16 target_value=1.0, # Minimum 1 second
17 operator="ge"
18))
19
20pipeline.add_stage(PreserveByValueStage(
21 input_value_key="duration",
22 target_value=15.0, # Maximum 15 seconds
23 operator="le"
24))
25
26# Step 3: Remove error files
27pipeline.add_stage(PreserveByValueStage(
28 input_value_key="duration",
29 target_value=0.0, # Exclude -1.0 error values
30 operator="gt"
31))

Performance Considerations

Memory Usage

  • The stage reads audio samples to compute frames
  • Memory usage scales with file duration, channels, and data type
  • Reduce batch size when processing large files or large batches of files
  • For a custom alternative that avoids loading samples, use soundfile.info to get frames and samplerate

Processing Speed

  • Duration calculation is I/O bound and scales with file size
  • Network-mounted files can be slower than local storage
  • Consider parallel processing for large datasets using Ray

File System Optimization

For better performance with large datasets:

  • Use local storage when possible
  • Ensure sufficient I/O bandwidth
  • Consider file system caching

Troubleshooting

Common Issues

Unsupported Audio Formats

1# Check supported formats
2import soundfile as sf
3print("Supported formats:", sf.available_formats())
4
5# Common supported formats: WAV, FLAC, OGG, AIFF
6# MP3 support depends on your system's libsndfile build