ALM Data Builder
ALMDataBuilderStage constructs candidate training windows from consecutive diarized audio segments. Each window must meet configurable constraints for duration, sample rate, bandwidth, and speaker count before it is included in the output.
How it Works
For each input manifest entry, the stage:
- Checks the entry-level sample rate against
min_sample_rate(skips the entire entry if below threshold) - Iterates over segments as potential window starting points
- For each starting segment, appends consecutive segments until the accumulated duration reaches the target
- Optionally truncates the final segment if the window exceeds the maximum duration
- Validates that the window contains the required number of distinct speakers
- Records loss statistics for segments that do not meet constraints
The stage processes one AudioTask at a time and mutates the task data in place.
Parameters
Basic Usage
Advanced Configuration
Short Windows for Fine-Tuning
Permissive Quality Thresholds
Preserving Segment Fields
By default, the stage drops words from segments and words,segments from the top level. To preserve all fields:
Output Fields
The stage adds the following fields to each AudioTask:
Speaker Durations
Each window includes a speaker_durations array containing the total speaking time of the top five speakers, sorted by duration in descending order. The array is zero-padded to length five when a window has fewer than five speakers.
Loss Statistics
The stats dictionary contains the following counters:
Best Practices
- Start with the default parameters and adjust based on the
statsoutput - If
lost_spkis high relative to total segments, widen the speaker count range - If
lost_bwis high, verify that the input data has bandwidth metadata insegments[].metrics.bandwidth - Use
drop_fieldsto reduce output file size when downstream stages do not need word-level or segment-level detail
Related Topics
- ALM Overlap Filtering: Next stage in the ALM pipeline
- ALM Pipeline Concepts: Architectural overview
- ALM Tutorial: End-to-end walkthrough with sample data