Band Filter
Classify each audio segment as full_band or narrow_band and drop anything that doesn’t match the configured target band. Use it when your training set requires a consistent acoustic bandwidth.
Understanding Audio Bandwidth
Full-Band vs Narrow-Band
Audio bandwidth describes the highest frequency the recording captures, set by the codec or transmission medium:
BandFilterStage distinguishes specifically between full-band and narrow-band — it does not currently classify wide-band as a separate category.
When to Use the Band Filter
- Train TTS or voice cloning models: full-band only — narrow-band audio lacks the high-frequency content needed for natural reconstruction.
- Train ASR for call-center / customer-service: narrow-band only — match the deployment domain.
- Heterogeneous web crawls: choose one based on downstream use; log how much you drop to assess data composition.
If your dataset is known to be uniformly one band, you can skip this stage. The classifier is most useful for filtering mixed sources.
Basic Band Filtering
Step 1: Configure the Stage
The stage uses a scikit-learn classifier trained on spectral features. The default model is downloaded on first use; cache the location with cache_dir:
Step 2: Choose Standalone vs In-Pipeline Mode
The stage supports two input modes:
In-pipeline mode is automatic when an upstream stage has populated waveform; otherwise the stage falls back to reading from audio_filepath.
Parameters
The default resource allocation is Resources(cpus=4.0) — the classifier is CPU-only.
Domain-Specific Tuning
TTS / Voice Cloning Training
Demand full-band only:
Call-Center ASR
Train against the deployment domain:
Mixed Web Crawls
Keep both bands but log the split for analysis. Run the classifier in score-only mode by adding it to the pipeline upstream of any other filter, then export the manifest before applying band_value filtering:
If the distribution is severely skewed, you may want to filter; if balanced, training on both can improve robustness.
Complete Band-Filter Pipeline Example
Best Practices
- Verify your assumption first: don’t band-filter without first confirming your dataset actually contains a mix. If everything is full-band, you’ll just add latency for no benefit.
- Cache the model: set
cache_dirto avoid re-downloading the classifier on every run, especially in containerized or ephemeral environments. - Place band filter early: it’s cheap (CPU-only). Run it before expensive GPU stages (UTMOS, SIGMOS, speaker separation) so you don’t pay for scoring audio you’d reject anyway.
- Don’t mix
band_valuewithMonoConversionStageresampling: if upstream resampling has changed the spectrum, the classifier may misclassify. Place the band filter immediately after VAD on the original-rate audio when possible.
Related Topics
- UTMOS Filter — quality scoring; commonly run after band filtering.
- VAD Segmentation — typical upstream stage producing the segments classified here.
AudioDataFilterStageComposite — bundles the band filter into the standard pipeline.