nemo_curator.stages.audio.preprocessing.concatenation
nemo_curator.stages.audio.preprocessing.concatenation
Audio segment concatenation stage.
Concatenates VAD segments stored in task.data["segments"] (nested mode)
into one combined waveform per source file. Segments are sorted by
segment_num (gaps from filtered-out segments are fine — order is
preserved) and concatenated with configurable silence between them.
Stores segment-to-original mappings in task._metadata so downstream
stages (TimestampMapperStage) can resolve final positions back to
the original file.
Uses canonical waveform + sample_rate format only (no pydub).
Module Contents
Classes
API
Bases: ProcessingStage[AudioTask, AudioTask]
Concatenate nested VAD segments into a single combined waveform.
Expects each incoming AudioTask to carry a
task.data["segments"] list (one file = one task, produced by
VADSegmentationStage(nested=True)). Segments are sorted by
segment_num, concatenated with silence gaps, and the result
is a single AudioTask with the combined waveform and
segment-to-original mappings in task._metadata["segment_mappings"].
Parameters:
Duration of silence inserted between consecutive segments (seconds).
Concatenate a list of segment dicts from the same source file.
Sort key for segment dicts: (segment_num, start_ms, 0).
Validate and return (waveform, sample_rate) or None if invalid.
Concatenate segments from task.data["segments"].
Mapping from concatenated position to original file position.