nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter
nemo_curator.stages.audio.advanced_pipelines.audio_data_filter.audio_data_filter
Audio Data Filter Stage — CompositeStage that decomposes into independent pipeline stages for extracting clean single-speaker segments.
Pipeline (when all filters + speaker separation enabled)::
- MonoConversion (1:1)
- VAD batch mode (1:1, items = N segments)
- BandFilter (1:1, filter items)
- UTMOS (1:1, filter items)
- SIGMOS (1:1, filter items)
- SegmentConcatenation (1:1, M items -> 1 item + timestamp mappings)
- SpeakerSeparation (1:N fan-out) 8-11. Per-speaker: VAD + Band + UTMOS + SIGMOS
- TimestampMapper (1:1, resolve to original file positions)
Usage::
Using default config
pipeline.add_stage(AudioDataFilterStage())
Using custom YAML config
pipeline.add_stage(AudioDataFilterStage(config_path=“/path/to/config.yaml”))
Module Contents
Classes
API
Bases: CompositeStage
Complete audio data filtering and curation pipeline (CompositeStage).
Decomposes into independent stages that the executor can schedule with
cross-file parallelism. Each stage owns its own default resource
allocation. Use .with_() to override individual stage resources.
Supports four pipeline topologies based on which features are enabled:
- Combo 1 (VAD=off, Speaker=off): MonoConversion → Filters → TimestampMapper
- Combo 2 (VAD=on, Speaker=off): MonoConversion → VAD(fan-out) → Filters → TimestampMapper
- Combo 3 (VAD=off, Speaker=on): MonoConversion → Filters → SpeakerSep → Filters → TimestampMapper
- Combo 4 (VAD=on, Speaker=on): Full pipeline with SegmentConcat + TimestampMapper
Parameters:
Path to a YAML config file. When None the
built-in default_config.yaml is used.
Pre-loaded config dict (alternative to config_path). When both are given, config values override the YAML file.
Name for this composite stage instance.
Append quality filter stages (Band, UTMOS, SIGMOS) to stages.
Combo 1: VAD=off, Speaker=off. Filters only, TimestampMapper cleans up.
Combo 4: VAD=on, Speaker=on. Identical to the original design.
Combo 3: VAD=off, Speaker=on. SpeakerSep fans out with diar_segments.
Combo 2: VAD=on, Speaker=off. VAD fans out, OutputNormalizer cleans up.
Build a self-consistent pipeline topology based on enabled features.