Curate AudioProcess DataQuality Filtering

Band Filter

View as Markdown

Classify each audio segment as full_band or narrow_band and drop anything that doesn’t match the configured target band. Use it when your training set requires a consistent acoustic bandwidth.

Understanding Audio Bandwidth

Full-Band vs Narrow-Band

Audio bandwidth describes the highest frequency the recording captures, set by the codec or transmission medium:

BandFrequency RangeTypical Sources
Full-band0–20 kHz (or 0–24 kHz)Studio recordings, modern smartphones, professional broadcast, music production
Wide-band0–8 kHzModern voice-over-IP, some podcasts
Narrow-band0–4 kHzTraditional telephony (PSTN), older codecs (G.711, GSM)

BandFilterStage distinguishes specifically between full-band and narrow-band — it does not currently classify wide-band as a separate category.

When to Use the Band Filter

  • Train TTS or voice cloning models: full-band only — narrow-band audio lacks the high-frequency content needed for natural reconstruction.
  • Train ASR for call-center / customer-service: narrow-band only — match the deployment domain.
  • Heterogeneous web crawls: choose one based on downstream use; log how much you drop to assess data composition.

If your dataset is known to be uniformly one band, you can skip this stage. The classifier is most useful for filtering mixed sources.

Basic Band Filtering

Step 1: Configure the Stage

1from nemo_curator.stages.audio.filtering.band import BandFilterStage
2
3# Keep only full-band audio
4band = BandFilterStage(band_value="full_band")
5pipeline.add_stage(band)
6
7# Or keep only narrow-band audio
8band = BandFilterStage(band_value="narrow_band")
9pipeline.add_stage(band)

The stage uses a scikit-learn classifier trained on spectral features. The default model is downloaded on first use; cache the location with cache_dir:

1band = BandFilterStage(
2 band_value="full_band",
3 cache_dir="./.cache/band_filter",
4)

Step 2: Choose Standalone vs In-Pipeline Mode

The stage supports two input modes:

ModeInputWhen to Use
In-pipelinewaveform from upstream (e.g., from MonoConversionStage or VADSegmentationStage)Default — pulls existing waveform; no extra disk I/O.
Standaloneaudio_filepath onlyUseful when running the filter as a one-off classification step before any other stages.

In-pipeline mode is automatic when an upstream stage has populated waveform; otherwise the stage falls back to reading from audio_filepath.

Parameters

ParameterTypeDefaultDescription
model_pathstr | NoneNoneLocal path to the band-classifier .joblib model. When None, the stage downloads the default model (nvidia/nemocurator-speech-bandwidth-filter) into cache_dir.
cache_dirstr | NoneNoneDirectory for caching the downloaded model.
band_value"full_band" | "narrow_band""full_band"Band class to keep; segments classified differently are filtered out.

The default resource allocation is Resources(cpus=4.0) — the classifier is CPU-only.

Domain-Specific Tuning

TTS / Voice Cloning Training

Demand full-band only:

1BandFilterStage(band_value="full_band")

Call-Center ASR

Train against the deployment domain:

1BandFilterStage(band_value="narrow_band")

Mixed Web Crawls

Keep both bands but log the split for analysis. Run the classifier in score-only mode by adding it to the pipeline upstream of any other filter, then export the manifest before applying band_value filtering:

1# Score and inspect; do not filter yet
2import pandas as pd
3
4df = pd.read_json("./scored.jsonl", lines=True)
5print(df["band_classification"].value_counts())

If the distribution is severely skewed, you may want to filter; if balanced, training on both can improve robustness.

Complete Band-Filter Pipeline Example

1from nemo_curator.pipeline import Pipeline
2from nemo_curator.backends.xenna import XennaExecutor
3from nemo_curator.stages.audio.preprocessing.mono_conversion import MonoConversionStage
4from nemo_curator.stages.audio.segmentation.vad_segmentation import VADSegmentationStage
5from nemo_curator.stages.audio.filtering.band import BandFilterStage
6from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
7from nemo_curator.stages.text.io.writer import JsonlWriter
8
9pipeline = Pipeline(name="band_filtering")
10
11# 1. Normalize input
12pipeline.add_stage(MonoConversionStage(output_sample_rate=48000))
13
14# 2. Segment
15pipeline.add_stage(VADSegmentationStage(min_duration_sec=2.0))
16
17# 3. Keep only full-band segments
18pipeline.add_stage(
19 BandFilterStage(
20 band_value="full_band",
21 cache_dir="./.cache/band_filter",
22 )
23)
24
25# 4. Export
26pipeline.add_stage(AudioToDocumentStage())
27pipeline.add_stage(JsonlWriter(path="./full_band_audio"))
28
29executor = XennaExecutor()
30pipeline.run(executor)

Best Practices

  • Verify your assumption first: don’t band-filter without first confirming your dataset actually contains a mix. If everything is full-band, you’ll just add latency for no benefit.
  • Cache the model: set cache_dir to avoid re-downloading the classifier on every run, especially in containerized or ephemeral environments.
  • Place band filter early: it’s cheap (CPU-only). Run it before expensive GPU stages (UTMOS, SIGMOS, speaker separation) so you don’t pay for scoring audio you’d reject anyway.
  • Don’t mix band_value with MonoConversionStage resampling: if upstream resampling has changed the spectrum, the classifier may misclassify. Place the band filter immediately after VAD on the original-rate audio when possible.