Automatic Speech Recognition (ASR) systems typically generate text with no punctuation and capitalization of the words. Besides being hard to read, the ASR output could be an input to named entity recognition, machine translation or text-to-speech models. These models could potentially benefit when the input text contains punctuation and the words are capitalized correctly.

For each word in the input text, the model:

predicts a punctuation mark that should follow the word (if any). The model supports commas, periods, and question marks. predicts if the word should be capitalized or not.

TAO Toolkit provides a sample notebook to outline the end-to-end workflow on how to train a Punctuation and Capitalization model using TAO Toolkit and deploy it in Riva format on NGC resources.