Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Dataset Creation Tool Based on CTC-Segmentation#

This tool provides functionality to align long audio files with the corresponding transcripts and split them into shorter fragments that are suitable for an Automatic Speech Recognition (ASR) model training.

More details could be found in NeMo/tutorials/tools/CTC_Segmentation_Tutorial.ipynb (can be executed with Google’s Colab).

The tool is based on the CTC-Segmentation package and CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition [TOOLS1]

References#

[TOOLS1]

Ludwig Kürzinger, Dominik Winkelbauer, Lujun Li, Tobias Watzel, and Gerhard Rigoll. Ctc-segmentation of large corpora for german end-to-end speech recognition. In International Conference on Speech and Computer, 267–278. Springer, 2020.