*** title: AudioBatch description: API reference for AudioBatch - the task type for audio processing ------------------------------------------------------------------------------ `AudioBatch` is the task type for audio processing in NeMo Curator. ## Import ```python from nemo_curator.tasks import AudioBatch ``` ## Class Definition ```python from dataclasses import dataclass @dataclass class AudioBatch(Task[dict | list[dict]]): """Task containing audio data for processing. Attributes: task_id: Unique identifier for this batch. dataset_name: Name of the source dataset. data: Audio manifest data (dict or list of dicts). """ task_id: str dataset_name: str data: dict | list[dict] ``` ## Audio Manifest Format Audio data follows the NeMo manifest format: ```json { "audio_filepath": "/path/to/audio.wav", "duration": 5.2, "text": "Transcription text...", "speaker": "speaker_001", "metadata": { "sample_rate": 16000, "channels": 1 } } ``` ## Properties ### `num_items` Get the number of audio samples in the batch. ```python @property def num_items(self) -> int: """Returns the number of audio samples.""" ``` ## Creating AudioBatch ```python from nemo_curator.tasks import AudioBatch # Single manifest entry manifest = { "audio_filepath": "/data/audio/sample.wav", "duration": 5.2, "text": "Hello world", } batch = AudioBatch( task_id="audio_001", dataset_name="speech_dataset", data=manifest, ) # Multiple entries manifests = [ {"audio_filepath": "/data/audio/s1.wav", "duration": 3.1}, {"audio_filepath": "/data/audio/s2.wav", "duration": 4.5}, ] batch = AudioBatch( task_id="audio_batch_001", dataset_name="speech_dataset", data=manifests, ) ``` ## Usage in Stages ```python from dataclasses import dataclass from nemo_curator.stages.base import ProcessingStage from nemo_curator.tasks import AudioBatch @dataclass class DurationFilterStage(ProcessingStage[AudioBatch, AudioBatch]): """Filter audio by duration.""" name: str = "DurationFilter" min_duration: float = 1.0 max_duration: float = 30.0 def inputs(self) -> tuple[list[str], list[str]]: return ["data"], [] def outputs(self) -> tuple[list[str], list[str]]: return ["data"], [] def process(self, task: AudioBatch) -> AudioBatch | None: data = task.data # Handle both single dict and list if isinstance(data, dict): data = [data] filtered = [ item for item in data if self.min_duration <= item.get("duration", 0) <= self.max_duration ] if not filtered: return None return AudioBatch( task_id=f"{task.task_id}_filtered", dataset_name=task.dataset_name, data=filtered if len(filtered) > 1 else filtered[0], _metadata=task._metadata, _stage_perf=task._stage_perf, ) ``` ## Common Operations ### ASR Transcription ```python def process(self, task: AudioBatch) -> AudioBatch: data = task.data if isinstance(task.data, list) else [task.data] for item in data: audio_path = item["audio_filepath"] item["text"] = self.asr_model.transcribe(audio_path) return AudioBatch( task_id=f"{task.task_id}_{self.name}", dataset_name=task.dataset_name, data=data if len(data) > 1 else data[0], _metadata=task._metadata, _stage_perf=task._stage_perf, ) ``` ### Quality Scoring ```python def process(self, task: AudioBatch) -> AudioBatch: data = task.data if isinstance(task.data, list) else [task.data] for item in data: if "text" in item and "reference_text" in item: item["wer"] = compute_wer(item["reference_text"], item["text"]) return AudioBatch( task_id=f"{task.task_id}_{self.name}", dataset_name=task.dataset_name, data=data if len(data) > 1 else data[0], _metadata=task._metadata, _stage_perf=task._stage_perf, ) ``` ## Source Code [View source on GitHub](https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/tasks/audio.py)