*** description: >- Audio format support and error handling in NeMo Curator audio processing stages categories: * processors tags: * audio-validation * format-support * error-handling * soundfile personas: * data-scientist-focused * mle-focused difficulty: beginner content\_type: reference modality: audio-only *** # Audio Format Support NeMo Curator audio processing stages use the `soundfile` library for audio file handling. Built-in error handling surfaces unreadable or unsupported files during duration calculation. ## Supported Formats Audio stages support formats compatible with the `soundfile` library (backed by `libsndfile`): * **WAV**: Uncompressed audio (recommended for high quality) * **FLAC**: Lossless compression with metadata support * **OGG**: Open-source compressed format * **MP3**: Compressed format (availability depends on your system's `libsndfile` build) * **AIFF**: Apple uncompressed format Note: AAC/M4A is not supported by default by `soundfile`/`libsndfile`. Prefer WAV or FLAC for consistent cross-platform behavior. ## Built-in Error Handling ### Duration Calculation with Error Handling The `GetAudioDurationStage` automatically handles corrupted or unreadable files: ```python from nemo_curator.stages.audio.common import GetAudioDurationStage # Calculate duration with built-in error handling duration_stage = GetAudioDurationStage( audio_filepath_key="audio_filepath", duration_key="duration" ) ``` ### Error Handling Behavior When `soundfile`/`libsndfile` cannot read audio files: * **Duration Calculation**: Returns -1.0 for corrupted/unreadable files * **ASR Inference**: Will fail with clear error messages for unsupported formats * **File Validation**: Use duration = -1.0 as an indicator of file issues ```python from nemo_curator.stages.audio.common import PreserveByValueStage # Filter out corrupted files (duration = -1.0) valid_files_filter = PreserveByValueStage( input_value_key="duration", target_value=0.0, operator="gt" # greater than 0 ) ``` ## Working Example Here is a complete pipeline that handles format validation through built-in error handling: ```python from nemo_curator.pipeline import Pipeline from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage # Create pipeline with built-in error handling pipeline = Pipeline(name="audio_validation") # 1. Calculate duration (automatically handles format validation) pipeline.add_stage(GetAudioDurationStage( audio_filepath_key="audio_filepath", duration_key="duration" )) # 2. Filter out corrupted files (duration = -1.0 indicates issues) pipeline.add_stage(PreserveByValueStage( input_value_key="duration", target_value=0.0, operator="gt" )) # 3. Proceed with ASR inference on valid files only pipeline.add_stage(InferenceAsrNemoStage( model_name="nvidia/stt_en_fastconformer_hybrid_large_pc" )) ``` ## Format Support Check To check supported formats on your system: ```python import soundfile as sf # Check available formats print("Supported formats:") for format_name, format_info in sf.available_formats().items(): print(f" {format_name}: {format_info}") # Check specific file try: info = sf.info("your_audio_file.wav") print(f"File info: {info}") except Exception as e: print(f"File validation failed: {e}") ``` This approach leverages the built-in error handling of NeMo Curator's audio stages rather than requiring extra format validation steps.