AudioBatch Data Structure
This guide covers the AudioBatch data structure, which serves as the core container for audio data throughout NeMo Curator’s audio processing pipeline.
Overview
AudioBatch is a specialized data structure that extends NeMo Curator’s base Task class to handle audio-specific processing requirements:
- File Path Management: Automatically validates audio file existence and accessibility
- Batch Processing: Groups multiple audio samples for efficient parallel processing
- Metadata Handling: Preserves audio characteristics and processing results throughout pipeline stages
Structure and Components
Basic Structure
Key Attributes
Data Validation
Automatic Validation
AudioBatch provides built-in validation for audio data integrity.
Metadata Management
Standard Metadata Fields
Common fields stored in AudioBatch data:
Character error rate (CER) is available as a utility function and typically requires a custom stage to compute and store it.
Error Handling
Graceful Failure Modes
AudioBatch handles various error conditions:
Error Recovery Strategies
Performance Characteristics
Memory Usage
AudioBatch memory footprint depends on these factors:
- Number of samples: Memory usage scales linearly with batch size
- Metadata complexity: Additional metadata fields increase memory consumption
- File path lengths: Longer file paths consume more memory
- Audio file loading: Audio files are loaded on-demand and not cached in the batch
Processing Efficiency
Batch Size Impact:
Small batches:
- Lower memory usage
- Higher overhead per sample
- Better for memory-constrained environments
Medium batches:
- Balanced memory and performance
- Good for most use cases
- Optimal for CPU processing
Large batches:
- Higher memory usage
- Better GPU utilization
- Optimal for GPU processing with sufficient VRAM
Integration with Processing Stages
Stage Input/Output
AudioBatch serves as input and output for audio processing stages:
Chaining Stages
AudioBatch flows through multiple processing stages, with each stage adding new metadata fields: