Creating Grammars for Speech Hints#

This tutorial walks you through the process of creating custom speech hint grammars for use in inverse text normalization. The primary application of speech hint grammars is to provide specific normalization post Automatic Speech Recognition (ASR).

Dependencies#

You need to download speech hints grammars.

Prerequisites#

This tutorial assumes that you are familiar with finite state acceptors and transducers. Familiarity with the Pynini library and Nemo’s Weighted Finite-State Transducers (WFST) tutorial is assumed.

Overview#

Functionally, speech hints should at the bare minimum have the following:

  1. A passthrough finite-state transducer (FST) that transduces the input text as is (\(\Sigma *\) - a set of all possible strings over the alphabet \(\Sigma\)). This FST should have the longest distance/weight relative to the other FSTs.

  2. An FST per class of interest. FST for a class can import from other FSTs. However, the resultant FST when exported will be independent of FSTs it imports from.

Grammars in speech hints are composed on the fly. Grammars can be standalone references to the FST or consist of FSTs in context with sentences. Grammars are denoted by \$<FSTNAME>. For English, the following grammars are supported:

  1. $OOV_NUMERIC_SEQUENCE

  2. $OOV_ALPHA_SEQUENCE

  3. $OOV_ALPHA_NUMERIC_SEQUENCE

  4. $FULLPHONENUM

  5. $POSTALCODE

  6. $OOV_CLASS_ORDINAL

  7. $OOV_CLASS_NUMERIC

  8. $PERCENT

  9. $TIME

  10. $MONEY

  11. $MONTH

  12. $DAY

Using Existing Speech Hint Grammars in Python#

from speech_hint import apply_hint
import pynini
from pynini.lib import pynutil
# Applying `$FULLPHONENUM` Grammar on the Input Text
apply_hint("one eight hundred five five five four oh oh one","$FULLPHONENUM")
'1-800-555-4001'
apply_hint("my phone number is one eight hundred five five five four oh oh one","$FULLPHONENUM")
'my phone number is 1-800-555-4001'
# Specifying `$FULLPHONENUM` grammar in context
apply_hint("my phone number is one eight hundred five five five four oh oh one","my phone number is $FULLPHONENUM")
'my phone number is 1-800-555-4001'
apply_hint("I think my phone number is one eight hundred five five five four oh oh one","my phone number is $FULLPHONENUM")
'I think my phone number is 1-800-555-4001'
# Specifying `$FULLPHONENUM` Grammar in Context - Context Does Not Match
apply_hint("my phone number is one eight hundred five five five four oh oh one","my phone number is not $FULLPHONENUM")
'my phone number is one eight hundred five five five four oh oh one'

Sample Grammar for Handling Alphabet Sequences#

Let’s say we need to build a grammar to support the conversion of alphabet sequences to a single word (‘i b m’ -> ‘ibm’). For a detailed implementation, refer to the oov_class_alpha_sequence.py script.

# Function to Apply FST
def apply_fst(utterance, fst):
    try:
        return pynini.shortestpath(utterance @ fst).string().strip()
    except pynini.FstOpError:
        print(f"Error: No valid output with given input: '{utterance}, {fst}'")
from en.primitives import NEMO_ALPHA, NEMO_WHITE_SPACE
character = NEMO_ALPHA
word_fst = pynini.closure(character)
sequence = character + pynini.closure(pynutil.delete(" ") + character, 1)
fst = sequence @ (word_fst)

apply_fst('i b m', fst)
'ibm'

To use the custom FST (grammar) in speech_hints, add it with a suitable name to fst_dict in speech_hints.py. You can then export the grammars as an FST Archive (.far) file using the export_to_far.py script.