nemo_curator.stages.audio.filtering.sigmos

View as Markdown

SIGMOS (Signal-based Mean Opinion Score) filter stage.

Filters audio segments based on SIGMOS quality metrics including noise, overall quality, signal quality, coloration, discontinuity, loudness, and reverberation.

Accepts a single input format: either in-memory (waveform + sample_rate) or audio_filepath to a WAV file. Uses the SigMOS ONNX model directly; no temp files.

The ONNX model is downloaded automatically from Microsoft’s SIG-Challenge repository on first use and cached at ~/.cache/nemo_curator/sigmos_model/. Users can also provide a pre-downloaded model via the model_path parameter.

Module Contents

Classes

NameDescription
SIGMOSFilterStageSIGMOS quality assessment filter stage.

Functions

NameDescription
_get_audio_numpy_srGet (audio mono float32 numpy, sample_rate) from item.

Data

_DEFAULT_MODEL_DIR

_SIGMOS_MODEL_FILENAME

_SIGMOS_MODEL_URL

API

class nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage(
model_dir: str = _DEFAULT_MODEL_DIR,
model_path: str | None = None,
noise_threshold: float | None = 4.0,
ovrl_threshold: float | None = 3.5,
sig_threshold: float | None = None,
col_threshold: float | None = None,
disc_threshold: float | None = None,
loud_threshold: float | None = None,
reverb_threshold: float | None = None,
name: str = 'SIGMOSFilter',
batch_size: int = 1,
resources: nemo_curator.stages.resources.Resources = (lambda: Resources(cpus=1.0...
)
Dataclass

Bases: ProcessingStage[AudioTask, AudioTask]

SIGMOS quality assessment filter stage.

Filters audio segments based on SIGMOS quality metrics. Input: items with waveform + sample_rate (tensor/array) or audio_filepath (WAV). The ONNX model is loaded once in setup() and reused for all predictions.

The model is automatically downloaded from Microsoft’s SIG-Challenge GitHub repository on first use and cached at ~/.cache/nemo_curator/sigmos_model/. To skip downloading, place the ONNX file there manually or pass model_path pointing directly to the file.

Parameters:

model_dir
strDefaults to _DEFAULT_MODEL_DIR

Directory to store the downloaded model weights (default: ~/.cache/nemo_curator/sigmos_model/).

model_path
str | NoneDefaults to None

Direct path to a local SIGMOS ONNX model file. Overrides model_dir when provided.

noise_threshold
float | NoneDefaults to 4.0

Minimum noise score (None to disable)

ovrl_threshold
float | NoneDefaults to 3.5

Minimum overall score (None to disable)

sig_threshold
float | NoneDefaults to None

Minimum signal score (None to disable)

col_threshold
float | NoneDefaults to None

Minimum coloration score (None to disable)

disc_threshold
float | NoneDefaults to None

Minimum discontinuity score (None to disable)

loud_threshold
float | NoneDefaults to None

Minimum loudness score (None to disable)

reverb_threshold
float | NoneDefaults to None

Minimum reverb score (None to disable)

batch_size
int = 1
col_threshold
float | None = None
disc_threshold
float | None = None
loud_threshold
float | None = None
model_dir
str = _DEFAULT_MODEL_DIR
model_path
str | None = None
name
str = 'SIGMOSFilter'
noise_threshold
float | None = 4.0
ovrl_threshold
float | None = 3.5
resources
Resources
reverb_threshold
float | None = None
sig_threshold
float | None = None
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.__post_init__()
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage._check_thresholds(
scores: dict[str, float]
) -> tuple[bool, list[str]]
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage._download_model(
model_dir: str
) -> str
staticmethod

Download SIGMOS ONNX model from Microsoft’s SIG-Challenge repository.

Returns the path to the validated model file.

nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage._initialize_model() -> None
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage._process_single(
task: nemo_curator.tasks.AudioTask
) -> nemo_curator.tasks.AudioTask | None

Run SIGMOS scoring on a single (non-nested) task.

nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage._resolve_model_path() -> str

Resolve the ONNX model path: model_path override → model_dir download.

nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage._scores_from_prediction(
score_data: typing.Any
) -> dict[str, float]
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.process(
task: nemo_curator.tasks.AudioTask
) -> nemo_curator.tasks.AudioTask | list[nemo_curator.tasks.AudioTask]

Process a single AudioTask and filter by SIGMOS quality metrics.

When task.data contains a "segments" key (nested mode from VAD), each segment is evaluated individually and only survivors are kept.

nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.setup(
_: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.setup_on_node(
_node_info: nemo_curator.backends.base.NodeInfo | None = None,
_worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None
nemo_curator.stages.audio.filtering.sigmos.SIGMOSFilterStage.teardown() -> None
nemo_curator.stages.audio.filtering.sigmos._get_audio_numpy_sr(
item: dict[str, typing.Any],
task_id: str
) -> tuple[numpy.ndarray, int] | None

Get (audio mono float32 numpy, sample_rate) from item.

Returns None if unavailable or load fails.

nemo_curator.stages.audio.filtering.sigmos._DEFAULT_MODEL_DIR = str(Path.home() / '.cache' / 'nemo_curator' / 'sigmos_model')
nemo_curator.stages.audio.filtering.sigmos._SIGMOS_MODEL_FILENAME = 'model-sigmos_1697718653_41d092e8-epo-200.onnx'
nemo_curator.stages.audio.filtering.sigmos._SIGMOS_MODEL_URL = 'https://github.com/microsoft/SIG-Challenge/raw/main/ICASSP2024/sigmos/model-sig...