nemo_curator.stages.audio.preprocessing.concatenation

View as Markdown

Audio segment concatenation stage.

Concatenates VAD segments stored in task.data["segments"] (nested mode) into one combined waveform per source file. Segments are sorted by segment_num (gaps from filtered-out segments are fine — order is preserved) and concatenated with configurable silence between them.

Stores segment-to-original mappings in task._metadata so downstream stages (TimestampMapperStage) can resolve final positions back to the original file.

Uses canonical waveform + sample_rate format only (no pydub).

Module Contents

Classes

NameDescription
SegmentConcatenationStageConcatenate nested VAD segments into a single combined waveform.
SegmentMappingMapping from concatenated position to original file position.

API

class nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage(
silence_duration_sec: float = 0.5,
name: str = 'SegmentConcatenation',
batch_size: int = 1,
resources: nemo_curator.stages.resources.Resources = (lambda: Resources(cpus=1.0...
)
Dataclass

Bases: ProcessingStage[AudioTask, AudioTask]

Concatenate nested VAD segments into a single combined waveform.

Expects each incoming AudioTask to carry a task.data["segments"] list (one file = one task, produced by VADSegmentationStage(nested=True)). Segments are sorted by segment_num, concatenated with silence gaps, and the result is a single AudioTask with the combined waveform and segment-to-original mappings in task._metadata["segment_mappings"].

Parameters:

silence_duration_sec
floatDefaults to 0.5

Duration of silence inserted between consecutive segments (seconds).

batch_size
int = 1
name
str = 'SegmentConcatenation'
resources
Resources
silence_duration_sec
float = 0.5
nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage.__post_init__()
nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage._concatenate(
original_file: str,
segments: list[dict[str, typing.Any]],
task_id: str,
dataset_name: str
) -> nemo_curator.tasks.AudioTask | None

Concatenate a list of segment dicts from the same source file.

nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage._seg_sort_key(
seg: dict[str, typing.Any]
) -> tuple[int, int, int]
staticmethod

Sort key for segment dicts: (segment_num, start_ms, 0).

nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage._validate_segment(
seg: dict[str, typing.Any]
) -> tuple[torch.Tensor, int] | None
staticmethod

Validate and return (waveform, sample_rate) or None if invalid.

nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.preprocessing.concatenation.SegmentConcatenationStage.process(
task: nemo_curator.tasks.AudioTask
) -> nemo_curator.tasks.AudioTask | list[nemo_curator.tasks.AudioTask]

Concatenate segments from task.data["segments"].

class nemo_curator.stages.audio.preprocessing.concatenation.SegmentMapping(
original_file: str,
original_start_ms: int,
original_end_ms: int,
concat_start_ms: int,
concat_end_ms: int,
segment_index: int
)
Dataclass

Mapping from concatenated position to original file position.

concat_end_ms
int
concat_start_ms
int
original_end_ms
int
original_file
str
original_start_ms
int
segment_index
int
nemo_curator.stages.audio.preprocessing.concatenation.SegmentMapping.to_dict() -> dict[str, typing.Any]