Duration Calculation#
Calculate precise audio duration using the soundfile
library for quality assessment and metadata generation in audio curation pipelines.
Overview#
The GetAudioDurationStage
extracts precise timing information from audio files using the soundfile
library. This information is essential for quality filtering, dataset analysis, and ensuring consistent audio lengths for training.
Key Features#
High Precision: Uses
soundfile
for frame-accurate duration calculationFormat Support: Works with all audio formats supported by
soundfile
(WAV, FLAC, OGG, and so on)Error Handling: Returns -1.0 for corrupted or unreadable files
Pipeline Integration: Designed for use in NeMo Curator processing pipelines
How It Works#
The duration calculation stage reads audio samples and sample rate to determine exact duration:
from nemo_curator.stages.audio.common import GetAudioDurationStage
from nemo_curator.tasks import AudioBatch
# Initialize duration calculator
duration_stage = GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
)
# Process audio data
audio_data = {"audio_filepath": "/path/to/audio.wav", "text": "transcription"}
audio_batch = AudioBatch(data=[audio_data])
result_batch = duration_stage.process(audio_batch)
# Access duration information
duration = result_batch[0].data[0]["duration"]
print(f"Audio duration: {duration:.3f} seconds")
Duration Calculation Process#
File Reading: Uses
soundfile
to read audio samples and sample rateFrame Counting: Counts total audio frames from the loaded samples
Duration Calculation: Computes duration as
frames ÷ sample_rate
Error Handling: Sets duration to -1.0 for corrupted files
Configuration#
Basic Configuration#
from nemo_curator.stages.audio.common import GetAudioDurationStage
# Configure duration calculation
duration_stage = GetAudioDurationStage(
audio_filepath_key="audio_filepath", # Field containing audio file paths
duration_key="duration" # Output field for duration values
)
Custom Field Names#
# Use custom field names for your data format
duration_stage = GetAudioDurationStage(
audio_filepath_key="wav_file_path", # Custom input field
duration_key="audio_length_seconds" # Custom output field
)
Usage Examples#
Basic Duration Calculation#
from nemo_curator.stages.audio.common import GetAudioDurationStage
from nemo_curator.tasks import AudioBatch
# Sample audio data
audio_samples = [
{"audio_filepath": "/path/to/sample1.wav", "text": "Hello world"},
{"audio_filepath": "/path/to/sample2.wav", "text": "How are you"},
{"audio_filepath": "/path/to/sample3.wav", "text": "Good morning"}
]
# Create duration calculation stage
duration_stage = GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
)
# Process each sample
for sample in audio_samples:
audio_batch = AudioBatch(data=[sample])
result_batch = duration_stage.process(audio_batch)
processed_sample = result_batch[0].data[0]
print(f"File: {processed_sample['audio_filepath']}")
print(f"Duration: {processed_sample['duration']:.3f} seconds")
Pipeline Integration#
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
# Create audio processing pipeline
pipeline = Pipeline(name="audio_duration_pipeline")
# Add duration calculation stage
pipeline.add_stage(GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
))
# Add duration-based filtering (1-30 seconds)
pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=1.0,
operator="ge" # greater than or equal
))
pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=30.0,
operator="le" # less than or equal
))
Batch Processing#
from nemo_curator.stages.audio.common import GetAudioDurationStage
from nemo_curator.tasks import AudioBatch
# Process multiple samples in a batch
audio_data_list = [
{"audio_filepath": "/path/to/file1.wav", "text": "Sample 1"},
{"audio_filepath": "/path/to/file2.wav", "text": "Sample 2"},
{"audio_filepath": "/path/to/file3.wav", "text": "Sample 3"}
]
# Create batch
audio_batch = AudioBatch(data=audio_data_list)
# Process entire batch
duration_stage = GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
)
# Process returns list of AudioBatch objects
result_batches = duration_stage.process(audio_batch)
# Extract processed data
for batch in result_batches:
for sample in batch.data:
print(f"File: {sample['audio_filepath']}")
print(f"Duration: {sample['duration']:.3f} seconds")
Output Format#
The stage adds duration information to each audio sample’s metadata:
{
"audio_filepath": "/path/to/audio.wav",
"text": "Sample transcription text",
"duration": 12.345
}
For corrupted or unreadable files:
{
"audio_filepath": "/path/to/corrupted.wav",
"text": "Sample transcription text",
"duration": -1.0
}
Error Handling#
The stage handles various error conditions:
File Not Found#
# Non-existent files result in duration = -1.0
sample = {"audio_filepath": "/nonexistent/file.wav", "text": "test"}
audio_batch = AudioBatch(data=[sample])
result = duration_stage.process(audio_batch)
# result[0].data[0]["duration"] == -1.0
Corrupted Audio Files#
# Corrupted files are logged and marked with duration = -1.0
# Check logs for specific error messages
import logging
logging.basicConfig(level=logging.WARNING)
# Process will continue with other files
result = duration_stage.process(audio_batch)
Filtering Error Files#
from nemo_curator.stages.audio.common import PreserveByValueStage
# Filter out files with calculation errors
error_filter = PreserveByValueStage(
input_value_key="duration",
target_value=0.0,
operator="gt" # greater than (excludes -1.0 error values)
)
Integration with Quality Assessment#
Duration calculation is typically the first step in quality assessment workflows:
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
# Create comprehensive quality pipeline
pipeline = Pipeline(name="audio_quality_assessment")
# Step 1: Calculate durations
pipeline.add_stage(GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
))
# Step 2: Filter by duration range (optimal for ASR training)
pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=1.0, # Minimum 1 second
operator="ge"
))
pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=15.0, # Maximum 15 seconds
operator="le"
))
# Step 3: Remove error files
pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=0.0, # Exclude -1.0 error values
operator="gt"
))
Performance Considerations#
Memory Usage#
The stage reads audio samples to compute frames
Memory usage scales with file duration, channels, and data type
Reduce batch size when processing large files or large batches of files
For a custom alternative that avoids loading samples, use
soundfile.info
to getframes
andsamplerate
Processing Speed#
Duration calculation is I/O bound and scales with file size
Network-mounted files may be slower than local storage
Consider parallel processing for large datasets using Ray
File System Optimization#
# For better performance with large datasets:
# 1. Use local storage when possible
# 2. Ensure sufficient I/O bandwidth
# 3. Consider file system caching
Troubleshooting#
Common Issues#
Unsupported Audio Formats#
# Check supported formats
import soundfile as sf
print("Supported formats:", sf.available_formats())
# Common supported formats: WAV, FLAC, OGG, AIFF
# MP3 support depends on your system's libsndfile build