nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter

View as Markdown

Audio Data Filter Stage — CompositeStage that decomposes into independent pipeline stages for extracting clean single-speaker segments.

Pipeline (when all filters + speaker separation enabled)::

  1. MonoConversion (1:1)
  2. VAD batch mode (1:1, items = N segments)
  3. BandFilter (1:1, filter items)
  4. UTMOS (1:1, filter items)
  5. SIGMOS (1:1, filter items)
  6. SegmentConcatenation (1:1, M items -> 1 item + timestamp mappings)
  7. SpeakerSeparation (1:N fan-out) 8-11. Per-speaker: VAD + Band + UTMOS + SIGMOS
  8. TimestampMapper (1:1, resolve to original file positions)

Usage::

Using default config

pipeline.add_stage(AudioDataFilterStage())

Using custom YAML config

pipeline.add_stage(AudioDataFilterStage(config_path=“/path/to/config.yaml”))

Module Contents

Classes

NameDescription
AudioDataFilterStageComplete audio data filtering and curation pipeline (CompositeStage).

API

class nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage(
config_path: str | pathlib.Path | None = None,
config: dict[str, typing.Any] | None = None,
name: str = 'AudioDataFilter'
)

Bases: CompositeStage

Complete audio data filtering and curation pipeline (CompositeStage).

Decomposes into independent stages that the executor can schedule with cross-file parallelism. Each stage owns its own default resource allocation. Use .with_() to override individual stage resources.

Supports four pipeline topologies based on which features are enabled:

  • Combo 1 (VAD=off, Speaker=off): MonoConversion → Filters → TimestampMapper
  • Combo 2 (VAD=on, Speaker=off): MonoConversion → VAD(fan-out) → Filters → TimestampMapper
  • Combo 3 (VAD=off, Speaker=on): MonoConversion → Filters → SpeakerSep → Filters → TimestampMapper
  • Combo 4 (VAD=on, Speaker=on): Full pipeline with SegmentConcat + TimestampMapper

Parameters:

config_path
str | Path | NoneDefaults to None

Path to a YAML config file. When None the built-in default_config.yaml is used.

config
dict[str, Any] | NoneDefaults to None

Pre-loaded config dict (alternative to config_path). When both are given, config values override the YAML file.

name
strDefaults to 'AudioDataFilter'

Name for this composite stage instance.

_cfg
= load_config(config_path)
nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._append_quality_filters(
stages: list[nemo_curator.stages.base.ProcessingStage],
cfg: dict,
suffix: str
) -> None
staticmethod

Append quality filter stages (Band, UTMOS, SIGMOS) to stages.

nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._build_filters_only_pipeline(
cfg: dict
) -> list[nemo_curator.stages.base.ProcessingStage]

Combo 1: VAD=off, Speaker=off. Filters only, TimestampMapper cleans up.

nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._build_full_pipeline(
cfg: dict
) -> list[nemo_curator.stages.base.ProcessingStage]

Combo 4: VAD=on, Speaker=on. Identical to the original design.

nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._build_speaker_only_pipeline(
cfg: dict
) -> list[nemo_curator.stages.base.ProcessingStage]

Combo 3: VAD=off, Speaker=on. SpeakerSep fans out with diar_segments.

nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._build_vad_only_pipeline(
cfg: dict
) -> list[nemo_curator.stages.base.ProcessingStage]

Combo 2: VAD=on, Speaker=off. VAD fans out, OutputNormalizer cleans up.

nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._make_mono(
cfg: dict
) -> nemo_curator.stages.audio.preprocessing.MonoConversionStage
staticmethod
nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._make_speaker_sep(
cfg: dict
) -> nemo_curator.stages.audio.segmentation.SpeakerSeparationStage
staticmethod
nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._make_timestamp_mapper(
cfg: dict
) -> nemo_curator.stages.audio.postprocessing.TimestampMapperStage
staticmethod
nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage._make_vad(
cfg: dict,
suffix: str,
nested: bool
) -> nemo_curator.stages.audio.segmentation.VADSegmentationStage
staticmethod
nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter.AudioDataFilterStage.decompose() -> list[nemo_curator.stages.base.ProcessingStage]

Build a self-consistent pipeline topology based on enabled features.