nemo_curator.stages.audio.alm.alm_data_builder

View as Markdown

ALM Data Builder Stage - Native NeMo Curator Implementation.

Creates training windows from audio segments. Follows the exact pattern from NeMo Curator: https://github.com/NVIDIA-NeMo/Curator/blob/main/nemo_curator/stages/audio/common.py

Produces identical output to SDP implementation.

Module Contents

Classes

NameDescription
ALMDataBuilderStageBuild ALM training windows from audio segments.
BuilderStatsTracks segment loss reasons and counts during window building.

Functions

NameDescription
_compute_speaker_durations-
_get_bandwidth-
_record_window_lossRecord statistics for a rejected window.
_truncate_segmentTruncate a segment’s words at the given end time, returning a modified copy.

Data

MIN_SEGMENTS_PER_WINDOW

API

class nemo_curator.stages.audio.alm.alm_data_builder.ALMDataBuilderStage(
name: str = 'alm_data_builder',
target_window_duration: float = 120.0,
tolerance: float = 0.1,
min_bandwidth: int = 8000,
min_sample_rate: int = 16000,
min_speakers: int = 2,
max_speakers: int = 5,
truncation: bool = True,
drop_fields: str = 'words',
drop_fields_top_level: str = 'words,segments'
)
Dataclass

Bases: ProcessingStage[AudioTask, AudioTask]

Build ALM training windows from audio segments.

Filters segments by sample rate, bandwidth, speaker count, and duration to create valid training windows. Mutates the entry dict in-place, adding windows, stats, and truncation_events keys.

drop_fields
str = 'words'
drop_fields_top_level
str = 'words,segments'
max_speakers
int = 5
min_bandwidth
int = 8000
min_sample_rate
int = 16000
min_speakers
int = 2
name
str = 'alm_data_builder'
target_window_duration
float = 120.0
tolerance
float = 0.1
truncation
bool = True
nemo_curator.stages.audio.alm.alm_data_builder.ALMDataBuilderStage.__post_init__() -> None

Compute derived parameters - EXACT match to SDP.

nemo_curator.stages.audio.alm.alm_data_builder.ALMDataBuilderStage._process_single_entry(
entry_data: dict[str, typing.Any]
) -> dict[str, typing.Any]

Process a single entry and extract valid training windows.

nemo_curator.stages.audio.alm.alm_data_builder.ALMDataBuilderStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.alm.alm_data_builder.ALMDataBuilderStage.process(
task: nemo_curator.tasks.AudioTask
) -> nemo_curator.tasks.AudioTask
class nemo_curator.stages.audio.alm.alm_data_builder.BuilderStats(
total_segments: int = 0,
total_dur: float = 0.0,
swift_path: str = '',
audio_sample_rate: int = 0,
lost_bw: int = 0,
dur_lost_bw: float = 0.0,
lost_sr: int = 0,
dur_lost_sr: float = 0.0,
lost_spk: int = 0,
dur_lost_spk: float = 0.0,
lost_win: int = 0,
dur_lost_win: float = 0.0,
lost_no_spkr: int = 0,
dur_lost_no_spkr: float = 0.0,
lost_next_seg_bm: int = 0,
dur_lost_next_seg_bm: float = 0.0,
lost_win_full_data: list = list()
)
Dataclass

Tracks segment loss reasons and counts during window building.

audio_sample_rate
int = 0
dur_lost_bw
float = 0.0
dur_lost_next_seg_bm
float = 0.0
dur_lost_no_spkr
float = 0.0
dur_lost_spk
float = 0.0
dur_lost_sr
float = 0.0
dur_lost_win
float = 0.0
lost_bw
int = 0
lost_next_seg_bm
int = 0
lost_no_spkr
int = 0
lost_spk
int = 0
lost_sr
int = 0
lost_win
int = 0
lost_win_full_data
list = field(default_factory=list)
swift_path
str = ''
total_dur
float = 0.0
total_segments
int = 0
nemo_curator.stages.audio.alm.alm_data_builder.BuilderStats.to_dict() -> dict[str, typing.Any]
nemo_curator.stages.audio.alm.alm_data_builder._compute_speaker_durations(
segments: list[dict[str, typing.Any]]
) -> dict[str, float]
nemo_curator.stages.audio.alm.alm_data_builder._get_bandwidth(
seg: dict[str, typing.Any]
) -> int
nemo_curator.stages.audio.alm.alm_data_builder._record_window_loss(
stat: nemo_curator.stages.audio.alm.alm_data_builder.BuilderStats,
seg: dict[str, typing.Any],
segments: list[dict[str, typing.Any]],
start_idx: int,
curr_idx: int,
window_segs: list[dict[str, typing.Any]],
drop_fields: set[str],
min_bandwidth: int
) -> None

Record statistics for a rejected window.

nemo_curator.stages.audio.alm.alm_data_builder._truncate_segment(
seg: dict[str, typing.Any],
truncated_end: float
) -> dict[str, typing.Any]

Truncate a segment’s words at the given end time, returning a modified copy.

nemo_curator.stages.audio.alm.alm_data_builder.MIN_SEGMENTS_PER_WINDOW = 2