ALM Overlap Filtering
ALMDataOverlapStage removes redundant training windows that share too much audio content. When two windows overlap beyond a configurable threshold, the stage keeps the window whose duration is closest to the target and discards the other.
How it Works
The stage processes each AudioTask independently:
- Extracts the
windowslist produced byALMDataBuilderStage - Sorts windows by start time
- For each window, compares it against every later window whose start falls before its end — all pairs that overlap in time, not only adjacent ones — and calculates the overlap ratio (overlap duration divided by the shorter window duration)
- When the overlap ratio meets the threshold, greedily removes the window whose duration is further from
target_duration - Writes filtered results back to the task
Parameters
Overlap Percentage Behavior
Basic Usage
Advanced Configuration
Moderate Filtering
Short-Window Pipeline
When using shorter target windows, match the target_duration parameter:
Output Fields
The stage adds the following user-facing fields to each AudioTask:
The stage also writes several intermediate fields (total_dur_list_window, total_dur_list_window_timestamps, filtered, swift_filepath) that are primarily used for internal bookkeeping. The original windows list produced by ALMDataBuilderStage is preserved so downstream consumers can compare pre- and post-filter results.
Tuning the Overlap Threshold
The right threshold depends on your training requirements:
- For diverse training data, use a low
overlap_percentage(0 to 30) to maximize the variety of audio content in the training set - For maximum training volume, use a higher
overlap_percentage(70 to 100) to retain more windows at the cost of some redundancy - For balanced results, use
overlap_percentage=50as a starting point and adjust based on the ratio offiltered_windowsto inputwindows
Monitor the yield by comparing filtered_dur to total_dur_window in the output.
Related Topics
- ALM Data Builder: Previous stage in the ALM pipeline
- ALM Pipeline Concepts: Architectural overview
- ALM Tutorial: End-to-end walkthrough with sample data