Customization#

This section demonstrates customization options with ASR models. These options can be used with Streaming and Offline APIs.

These examples use Riva ASR sample clients in Python to demonstrate Riva ASR features. You can build your own speech AI applications with Riva using API Reference, Python libraries, and referring to the sample clients.

Note

Whisper and Canary models do not support any of the below customization parameters.

Note

Parakeet 1.1b RNNT Multilingual model only supports the Automatic Punctuation feature. No other customization parameters are supported.

Word Boosting#

Word boosting allows you to bias the ASR engine to recognize particular words of interest at request time by assigning them higher scores when decoding the acoustic model’s output. We recommend a boost score in the range of 20 to 100.

Copy an example audio file from the NIM container to the host machine, or use your own.

docker cp $CONTAINER_ID:/opt/riva/wav/en-US_wordboosting_sample.wav .

First, run ASR on the sample audio without word boosting.

python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 \
    --language-code en-US \
    --input-file en-US_wordboosting_sample.wav

Output:

## aunt bertha and ab loper both transformer based language models are examples of the emerging work in using graph neural networks to design protein sequences for particular target antigens

As seen in the output, ASR struggles to recognize domain-specific terms like AntiBERTa and ABlooper. You can apply word boosting to improve ASR accuracy for these domain-specific terms.

python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 \
    --language-code en-US \
    --input-file en-US_wordboosting_sample.wav \
    --boosted-lm-words AntiBERTa --boosted-lm-score 20 \
    --boosted-lm-words ABlooper --boosted-lm-score 20

Output:

## AntiBERTa and ABlooper both transformer based language models are examples of the emerging work in using graph neural networks to design protein sequences for particular target antigens

With word boosting enabled, ASR is able to correctly transcribe the domain-specific terms AntiBERTa and ABlooper.

Additional information about Word boosting:

The recommended range boost score range is 20 to 100. The higher the boost score, the more biased the ASR engine becomes toward recognizing this word. Negative boost scores can even discourage the ASR engine from predicting certain words.
There is no limit to the number of words that can be boosted. You should not notice any significant impact on latency, even with ~100 boosted words, except for the first request, where slightly higher latency is expected.

Automatic Punctuation#

Automatic punctuation and capitalization can be enabled by passing the flag --automatic-punctuation.

python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 \
    --input-file en-US_sample.wav \
    --language-code en-US \
    --automatic-punctuation

The previous command prints the transcript with punctuation and capitalization as shown in the following example.

## What is natural language processing?

Profanity Filter#

Riva ASR models can detect profane words in your audio data and censor them in the transcript. This feature uses a pre-defined list of profane words and is supported only for the English language.

To enable the profanity filter, pass the --profanity-filter flag to the sample client. When enabled, profane words appear with only the first letter visible, followed by asterisks in the transcript (for example, f***).

python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 \
    --input-file <your_file_with_profane_words> \
    --language-code en-US \
    --profanity-filter