Filter audio samples by duration ranges, speech rate metrics, and temporal characteristics to create optimal datasets for ASR training and speech processing applications.
Training Efficiency: Duration filtering can improve ASR training by removing samples that may be problematic for training
Processing Performance: Duration affects computational requirements:
Speech rate metrics (words per second, characters per second) help identify samples with speaking speeds appropriate for your use case.
The built-in speech rate calculation functions can be used within custom processing stages to analyze speaking speed and add metrics to your pipeline data.
If you have pre-calculated speech rate metrics in your data, you can filter based on them:
This example assumes you have already calculated and stored speech rate metrics in your audio data. The built-in stages do not automatically calculate speech rates - you would need to create a custom stage for that functionality.
After you calculate speech rate metrics, filter samples to keep those with appropriate speaking speeds:
These examples assume you have pre-calculated speech rate metrics in your audio data. Use the get_wordrate() and get_charrate() utility functions to calculate these values in a custom processing stage.
Typical speech rates for different contexts:
Over-Filtering: Removing too much data
Under-Filtering: Keeping problematic samples that may negatively impact training or processing efficiency.
Here’s a complete working example from the NeMo Curator tutorials showing actual duration filtering in practice:
This example comes directly from tutorials/audio/fleurs/pipeline.py and shows the correct parameter names and usage patterns for the built-in stages.