Band Filter | NeMo Curator

Classify each audio segment as full_band or narrow_band and drop anything that doesn’t match the configured target band. Use it when your training set requires a consistent acoustic bandwidth.

Understanding Audio Bandwidth

Full-Band vs Narrow-Band

Audio bandwidth describes the highest frequency the recording captures, set by the codec or transmission medium:

Band	Frequency Range	Typical Sources
Full-band	0–20 kHz (or 0–24 kHz)	Studio recordings, modern smartphones, professional broadcast, music production
Wide-band	0–8 kHz	Modern voice-over-IP, some podcasts
Narrow-band	0–4 kHz	Traditional telephony (PSTN), older codecs (G.711, GSM)

BandFilterStage distinguishes specifically between full-band and narrow-band — it does not currently classify wide-band as a separate category.

When to Use the Band Filter

Train TTS or voice cloning models: full-band only — narrow-band audio lacks the high-frequency content needed for natural reconstruction.
Train ASR for call-center / customer-service: narrow-band only — match the deployment domain.
Heterogeneous web crawls: choose one based on downstream use; log how much you drop to assess data composition.

If your dataset is known to be uniformly one band, you can skip this stage. The classifier is most useful for filtering mixed sources.

Basic Band Filtering

Step 1: Configure the Stage

1 from nemo_curator.stages.audio.filtering.band import BandFilterStage
2 
3 # Keep only full-band audio
4 band = BandFilterStage(band_value="full_band")
5 pipeline.add_stage(band)
6 
7 # Or keep only narrow-band audio
8 band = BandFilterStage(band_value="narrow_band")
9 pipeline.add_stage(band)

The stage uses a scikit-learn classifier trained on spectral features. The default model is downloaded on first use; cache the location with cache_dir:

1 band = BandFilterStage(
2     band_value="full_band",
3     cache_dir="./.cache/band_filter",
4 )

Step 2: Choose Standalone vs In-Pipeline Mode

The stage supports two input modes:

Mode	Input	When to Use
In-pipeline	`waveform` from upstream (e.g., from `MonoConversionStage` or `VADSegmentationStage`)	Default — pulls existing waveform; no extra disk I/O.
Standalone	`audio_filepath` only	Useful when running the filter as a one-off classification step before any other stages.

In-pipeline mode is automatic when an upstream stage has populated waveform; otherwise the stage falls back to reading from audio_filepath.

Parameters

Parameter	Type	Default	Description
`model_path`	str \| None	`None`	Local path to the band-classifier `.joblib` model. When `None`, the stage downloads the default model (`nvidia/nemocurator-speech-bandwidth-filter`) into `cache_dir`.
`cache_dir`	str \| None	`None`	Directory for caching the downloaded model.
`band_value`	`"full_band"` \| `"narrow_band"`	`"full_band"`	Band class to keep; segments classified differently are filtered out.

The default resource allocation is Resources(cpus=4.0) — the classifier is CPU-only.

Domain-Specific Tuning

TTS / Voice Cloning Training

Demand full-band only:

1 BandFilterStage(band_value="full_band")

Call-Center ASR

Train against the deployment domain:

1 BandFilterStage(band_value="narrow_band")

Mixed Web Crawls

Keep both bands but log the split for analysis. Run the classifier in score-only mode by adding it to the pipeline upstream of any other filter, then export the manifest before applying band_value filtering:

1 # Score and inspect; do not filter yet
2 import pandas as pd
3 
4 df = pd.read_json("./scored.jsonl", lines=True)
5 print(df["band_classification"].value_counts())

If the distribution is severely skewed, you may want to filter; if balanced, training on both can improve robustness.

Complete Band-Filter Pipeline Example

1 from nemo_curator.pipeline import Pipeline
2 from nemo_curator.backends.xenna import XennaExecutor
3 from nemo_curator.stages.audio.preprocessing.mono_conversion import MonoConversionStage
4 from nemo_curator.stages.audio.segmentation.vad_segmentation import VADSegmentationStage
5 from nemo_curator.stages.audio.filtering.band import BandFilterStage
6 from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
7 from nemo_curator.stages.text.io.writer import JsonlWriter
8 
9 pipeline = Pipeline(name="band_filtering")
10 
11 # 1. Normalize input
12 pipeline.add_stage(MonoConversionStage(output_sample_rate=48000))
13 
14 # 2. Segment
15 pipeline.add_stage(VADSegmentationStage(min_duration_sec=2.0))
16 
17 # 3. Keep only full-band segments
18 pipeline.add_stage(
19     BandFilterStage(
20         band_value="full_band",
21         cache_dir="./.cache/band_filter",
22     )
23 )
24 
25 # 4. Export
26 pipeline.add_stage(AudioToDocumentStage())
27 pipeline.add_stage(JsonlWriter(path="./full_band_audio"))
28 
29 executor = XennaExecutor()
30 pipeline.run(executor)

Best Practices

Verify your assumption first: don’t band-filter without first confirming your dataset actually contains a mix. If everything is full-band, you’ll just add latency for no benefit.
Cache the model: set cache_dir to avoid re-downloading the classifier on every run, especially in containerized or ephemeral environments.
Place band filter early: it’s cheap (CPU-only). Run it before expensive GPU stages (UTMOS, SIGMOS, speaker separation) so you don’t pay for scoring audio you’d reject anyway.
Don’t mix band_value with MonoConversionStage resampling: if upstream resampling has changed the spectrum, the classifier may misclassify. Place the band filter immediately after VAD on the original-rate audio when possible.

UTMOS Filter — quality scoring; commonly run after band filtering.
VAD Segmentation — typical upstream stage producing the segments classified here.
AudioDataFilterStage Composite — bundles the band filter into the standard pipeline.