nemo_curator.stages.audio.filtering.utmos
nemo_curator.stages.audio.filtering.utmos
nemo_curator.stages.audio.filtering.utmos
UTMOS (UTokyo-SaruLab MOS Prediction) filter stage.
Filters audio segments based on UTMOS predicted Mean Opinion Score. Uses the utmos22_strong model from tarepan/SpeechMOS via torch.hub.
Accepts in-memory (waveform + sample_rate) or audio_filepath input. Audio is resampled to 16 kHz internally for UTMOS inference.
Bases: ProcessingStage[AudioTask, AudioTask]
UTMOS quality assessment filter stage.
Filters audio segments based on the UTMOS predicted MOS score. The model (utmos22_strong) is loaded via torch.hub from tarepan/SpeechMOS. Audio is resampled to 16 kHz for inference.
Parameters:
Minimum MOS score to pass (None to disable)
Target sample rate for UTMOS inference (default 16000)
Run UTMOS scoring on a single (non-nested) task.
Process a single AudioTask and filter by UTMOS MOS score.
When task.data contains a "segments" key (nested mode from VAD),
each segment is evaluated individually and only survivors are kept.
Extract a mono waveform tensor (1, N) and sample_rate from an item.
Supports waveform (Tensor/ndarray) + sample_rate or audio_filepath. Returns None if unavailable.