Speech Data Explorer¶
Note
The tool could be found under NeMo/tools/speech_data_explorer.
Dash-based tool for interactive exploration of ASR/TTS datasets.
Features:
dataset’s statistics (alphabet, vocabulary, duration-based histograms)
navigation across dataset (sorting, filtering)
inspection of individual utterances (waveform, spectrogram, audio player)
errors’ analysis (Word Error Rate, Character Error Rate, Word Match Rate, Mean Word Accuracy, diff)
Please make sure that requirements are installed. Then run:
python data_explorer.py path_to_manifest.json
JSON manifest file should contain the following fields:
audio_filepath (path to audio file)
duration (duration of the audio file in seconds)
text (reference transcript)
Errors’ analysis requires “pred_text” (ASR transcript) for all utterances.
Any additional field will be parsed and displayed in ‘Samples’ tab.