Audio Format Support#
NeMo Curator audio processing stages use the soundfile
library for audio file handling. Built-in error handling surfaces unreadable or unsupported files during duration calculation.
Supported Formats#
Audio stages support formats compatible with the soundfile
library (backed by libsndfile
):
WAV: Uncompressed audio (recommended for high quality)
FLAC: Lossless compression with metadata support
OGG: Open-source compressed format
MP3: Compressed format (availability depends on your system’s
libsndfile
build)AIFF: Apple uncompressed format
Note: AAC/M4A is not supported by default by soundfile
/libsndfile
. Prefer WAV or FLAC for consistent cross-platform behavior.
Built-in Error Handling#
Duration Calculation with Error Handling#
The GetAudioDurationStage
automatically handles corrupted or unreadable files:
from nemo_curator.stages.audio.common import GetAudioDurationStage
# Calculate duration with built-in error handling
duration_stage = GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
)
Error Handling Behavior#
When soundfile
/libsndfile
cannot read audio files:
Duration Calculation: Returns -1.0 for corrupted/unreadable files
ASR Inference: Will fail with clear error messages for unsupported formats
File Validation: Use duration = -1.0 as an indicator of file issues
from nemo_curator.stages.audio.common import PreserveByValueStage
# Filter out corrupted files (duration = -1.0)
valid_files_filter = PreserveByValueStage(
input_value_key="duration",
target_value=0.0,
operator="gt" # greater than 0
)
Working Example#
Here is a complete pipeline that handles format validation through built-in error handling:
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
# Create pipeline with built-in error handling
pipeline = Pipeline(name="audio_validation")
# 1. Calculate duration (automatically handles format validation)
pipeline.add_stage(GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
))
# 2. Filter out corrupted files (duration = -1.0 indicates issues)
pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=0.0,
operator="gt"
))
# 3. Proceed with ASR inference on valid files only
pipeline.add_stage(InferenceAsrNemoStage(
model_name="nvidia/stt_en_fastconformer_hybrid_large_pc"
))
Format Support Check#
To check supported formats on your system:
import soundfile as sf
# Check available formats
print("Supported formats:")
for format_name, format_info in sf.available_formats().items():
print(f" {format_name}: {format_info}")
# Check specific file
try:
info = sf.info("your_audio_file.wav")
print(f"File info: {info}")
except Exception as e:
print(f"File validation failed: {e}")
This approach leverages the built-in error handling of NeMo Curator’s audio stages rather than requiring extra format validation steps.