Curate AudioProcess DataAudio Analysis

Audio Format Support

View as Markdown

NeMo Curator audio processing stages use the soundfile library for audio file handling. Built-in error handling surfaces unreadable or unsupported files during duration calculation.

Supported Formats

Audio stages support formats compatible with the soundfile library (backed by libsndfile):

  • WAV: Uncompressed audio (recommended for high quality)
  • FLAC: Lossless compression with metadata support
  • OGG: Open-source compressed format
  • MP3: Compressed format (availability depends on your system’s libsndfile build)
  • AIFF: Apple uncompressed format

Note: AAC/M4A is not supported by default by soundfile/libsndfile. Prefer WAV or FLAC for consistent cross-platform behavior.

Built-in Error Handling

Duration Calculation with Error Handling

The GetAudioDurationStage automatically handles corrupted or unreadable files:

1from nemo_curator.stages.audio.common import GetAudioDurationStage
2
3# Calculate duration with built-in error handling
4duration_stage = GetAudioDurationStage(
5 audio_filepath_key="audio_filepath",
6 duration_key="duration"
7)

Error Handling Behavior

When soundfile/libsndfile cannot read audio files:

  • Duration Calculation: Returns -1.0 for corrupted/unreadable files
  • ASR Inference: Will fail with clear error messages for unsupported formats
  • File Validation: Use duration = -1.0 as an indicator of file issues
1from nemo_curator.stages.audio.common import PreserveByValueStage
2
3# Filter out corrupted files (duration = -1.0)
4valid_files_filter = PreserveByValueStage(
5 input_value_key="duration",
6 target_value=0.0,
7 operator="gt" # greater than 0
8)

Working Example

Here is a complete pipeline that handles format validation through built-in error handling:

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
3from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
4
5# Create pipeline with built-in error handling
6pipeline = Pipeline(name="audio_validation")
7
8# 1. Calculate duration (automatically handles format validation)
9pipeline.add_stage(GetAudioDurationStage(
10 audio_filepath_key="audio_filepath",
11 duration_key="duration"
12))
13
14# 2. Filter out corrupted files (duration = -1.0 indicates issues)
15pipeline.add_stage(PreserveByValueStage(
16 input_value_key="duration",
17 target_value=0.0,
18 operator="gt"
19))
20
21# 3. Proceed with ASR inference on valid files only
22pipeline.add_stage(InferenceAsrNemoStage(
23 model_name="nvidia/stt_en_fastconformer_hybrid_large_pc"
24))

Format Support Check

To check supported formats on your system:

1import soundfile as sf
2
3# Check available formats
4print("Supported formats:")
5for format_name, format_info in sf.available_formats().items():
6 print(f" {format_name}: {format_info}")
7
8# Check specific file
9try:
10 info = sf.info("your_audio_file.wav")
11 print(f"File info: {info}")
12except Exception as e:
13 print(f"File validation failed: {e}")

This approach leverages the built-in error handling of NeMo Curator’s audio stages rather than requiring extra format validation steps.