***
description: >-
Understanding the AudioBatch data structure for efficient audio file
management and validation in NeMo Curator
categories:
* concepts-architecture
tags:
* data-structures
* audiobatch
* audio-validation
* batch-processing
* file-management
personas:
* data-scientist-focused
* mle-focused
difficulty: beginner
content\_type: concept
modality: audio-only
***
# AudioBatch Data Structure
This guide covers the `AudioBatch` data structure, which serves as the core container for audio data throughout NeMo Curator's audio processing pipeline.
## Overview
`AudioBatch` is a specialized data structure that extends NeMo Curator's base `Task` class to handle audio-specific processing requirements:
* **File Path Management**: Automatically validates audio file existence and accessibility
* **Batch Processing**: Groups multiple audio samples for efficient parallel processing
* **Metadata Handling**: Preserves audio characteristics and processing results throughout pipeline stages
## Structure and Components
### Basic Structure
```python
from nemo_curator.tasks import AudioBatch
# Create AudioBatch with single audio file
audio_batch = AudioBatch(
data={
"audio_filepath": "/path/to/audio.wav",
"text": "ground truth transcription",
"duration": 3.2,
"language": "en"
},
filepath_key="audio_filepath",
task_id="audio_task_001",
dataset_name="my_speech_dataset"
)
# Create AudioBatch with multiple audio files
audio_batch = AudioBatch(
data=[
{
"audio_filepath": "/path/to/audio1.wav",
"text": "first transcription",
"duration": 2.1
},
{
"audio_filepath": "/path/to/audio2.wav",
"text": "second transcription",
"duration": 3.5
}
],
filepath_key="audio_filepath",
task_id="audio_task_001",
dataset_name="my_speech_dataset"
)
```
### Key Attributes
| Attribute | Type | Description |
| -------------- | ---------------------------- | ----------------------------------------------------- |
| `data` | `dict \| list[dict] \| None` | Audio sample data (stored internally as `list[dict]`) |
| `filepath_key` | `str \| None` | Key name for audio file paths in data (optional) |
| `task_id` | `str` | Unique identifier for the batch |
| `dataset_name` | `str` | Name of the source dataset |
| `num_items` | `int` | Number of audio samples in batch (read-only property) |
## Data Validation
### Automatic Validation
`AudioBatch` provides built-in validation for audio data integrity.
## Metadata Management
### Standard Metadata Fields
Common fields stored in AudioBatch data:
```python
audio_sample = {
# Core fields (user-provided)
"audio_filepath": "/path/to/audio.wav",
"text": "transcription text",
# Fields added by processing stages
"pred_text": "asr prediction", # Added by ASR inference stages
"wer": 12.5, # Added by GetPairwiseWerStage
"duration": 3.2, # Added by GetAudioDurationStage
# Optional user-provided metadata
"language": "en_us",
"speaker_id": "speaker_001",
# Custom fields (examples)
"domain": "conversational",
"noise_level": "low"
}
```
Character error rate (CER) is available as a utility function and typically requires a custom stage to compute and store it.
## Error Handling
### Graceful Failure Modes
AudioBatch handles various error conditions:
```python
# Missing files
audio_batch = AudioBatch(data=[
{"audio_filepath": "/missing/file.wav", "text": "sample"}
])
# Validation fails, but processing continues with warnings
# Corrupted audio files
corrupted_sample = {
"audio_filepath": "/corrupted/audio.wav",
"text": "sample text"
}
# Duration calculation returns -1.0 for corrupted files
# Invalid metadata
invalid_sample = {
"audio_filepath": "/valid/audio.wav",
# Missing "text" field - needed for WER calculation but not enforced by AudioBatch
}
# AudioBatch does not enforce metadata field requirements. Add a validation stage if required.
```
### Error Recovery Strategies
```python
def robust_audiobatch_creation(raw_data: list) -> AudioBatch:
"""Create AudioBatch with error recovery."""
valid_data = []
error_count = 0
for item in raw_data:
try:
# Validate required fields
if "audio_filepath" not in item or "text" not in item:
error_count += 1
continue
# Validate file existence
if not os.path.exists(item["audio_filepath"]):
error_count += 1
continue
valid_data.append(item)
except Exception as e:
logger.warning(f"Error processing item: {e}")
error_count += 1
logger.info(f"Created AudioBatch with {len(valid_data)} valid items, {error_count} errors")
return AudioBatch(
data=valid_data,
filepath_key="audio_filepath"
)
```
## Performance Characteristics
### Memory Usage
AudioBatch memory footprint depends on these factors:
* **Number of samples**: Memory usage scales linearly with batch size
* **Metadata complexity**: Additional metadata fields increase memory consumption
* **File path lengths**: Longer file paths consume more memory
* **Audio file loading**: Audio files are loaded on-demand and not cached in the batch
### Processing Efficiency
**Batch Size Impact**:
**Small batches**:
* Lower memory usage
* Higher overhead per sample
* Better for memory-constrained environments
**Medium batches**:
* Balanced memory and performance
* Good for most use cases
* Optimal for CPU processing
**Large batches**:
* Higher memory usage
* Better GPU utilization
* Optimal for GPU processing with sufficient VRAM
## Integration with Processing Stages
### Stage Input/Output
AudioBatch serves as input and output for audio processing stages:
```python
# Stage processing signature
def process(self, task: AudioBatch) -> AudioBatch:
# Process audio data
processed_data = []
for item in task.data:
# Apply processing logic
processed_item = self.process_audio_item(item)
processed_data.append(processed_item)
# Return new AudioBatch with processed data
return AudioBatch(
data=processed_data,
filepath_key=task.filepath_key,
task_id=f"processed_{task.task_id}",
dataset_name=task.dataset_name
)
```
### Chaining Stages
AudioBatch flows through multiple processing stages, with each stage adding new metadata fields:
```mermaid
flowchart TD
A["AudioBatch (raw)
• audio_filepath
• text"] --> B[ASR Inference Stage]
B --> C["AudioBatch (with predictions)
• audio_filepath
• text
• pred_text"]
C --> D[Quality Assessment Stage]
D --> E["AudioBatch (with metrics)
• audio_filepath
• text
• pred_text
• wer
• duration"]
E --> F[Filter Stage]
F --> G["AudioBatch (filtered)
• audio_filepath
• text
• pred_text
• wer
• duration"]
G --> H[Export Stage]
H --> I[Output Files]
style A fill:#e1f5fe
style C fill:#f3e5f5
style E fill:#e8f5e8
style G fill:#fff3e0
style I fill:#fce4ec
```