Quality Assessment for Audio Data
Filter audio quality using transcription accuracy metrics, duration analysis, and custom quality measures to ensure high-quality speech datasets for ASR training.
How it Works
Audio quality assessment in NeMo Curator focuses on speech-specific metrics that correlate with training data quality:
- Transcription Accuracy: Word Error Rate (WER) and Character Error Rate (CER) between ground truth and ASR predictions
- Duration Analysis: Audio length validation and speech rate calculations
- Value-based Filtering: Configurable filtering using comparison operators
Quality Metrics
Word Error Rate (WER)
The primary metric for assessing transcription quality:
WER measures the percentage of words that differ between ground truth and predicted transcriptions:
- WER = 0%: Perfect transcription match
- WER = 25%: Good quality (1 in 4 words incorrect)
- WER = 50%: Moderate quality
- WER >75%: Poor quality (consider filtering)
Character Error Rate (CER)
More granular accuracy measurement at the character level. The get_cer() function is a utility for calculating CER programmatically::
The WER and CER utilities depend on the editdistance package. These are utility functions typically used within custom stages rather than directly in pipelines.
Speech Rate Metrics
NeMo Curator provides utility functions for analyzing speaking speed and content density. These functions are designed for use in custom processing stages:
For a complete example of using speech rate metrics in a pipeline, refer to the Duration Filtering guide.
Filtering Strategies
WER-based Filtering
Filter audio samples based on transcription accuracy:
Duration-based Filtering
Filter by audio length to remove short or long samples:
Combined Quality Filtering
Operator Options
The PreserveByValueStage supports several comparison operators:
Complete Quality Assessment Pipeline
Here’s a complete working example that demonstrates quality assessment:
Related Topics
- WER Filtering - Detailed guide to Word Error Rate filtering
- Duration Filtering - Audio length and speech rate filtering
- Audio Analysis - Audio file analysis and validation