German ASR Pipeline Deployment#

In this tutorial, we go through the steps to deploy a German ASR pipeline into production. Refer to the deployment.ipynb tutorial for an interactive version of this guide.

Model Checklist#

This tutorial assumes that you have the following models ready:

An acoustic model
A language model (optional)
An inverse text normalization model (optional)
A punctuation and capitalization model (optional)

Prerequisites#

Ensure you have access to NGC to download models with the NGC CLI tool.
Download the Riva Quick Start scripts to a local <RIVA_QUICKSTART_DIR> directory:

ngc registry resource download-version nvidia/riva/riva_quickstart:2.1.0

Prepare a local folder <RIVA_MODEL_DIR> to download models to.
Prepare a local folder <RIVA_MODEL_REPO> to store deployed Riva models.

BYO Models#

If bringing your own models, refer to the training section of this tutorial for details on how to train your own custom models.

Pretrained Models#

Alternatively, you can deploy pretrained models. All Riva German assets are published on NGC (including .nemo, .riva, .tlt and .rmir assets). You can use these models as starting points for your development or for deployment as-is.

Download the following models, either using the web interface or using the NGC CLI tool to the <RIVA_MODEL_DIR> directory.

Acoustic models: Select either:

Citrinet ASR German:
- Download NeMo version (.nemo format) with:

    ngc registry model download-version "nvidia/nemo/stt_de_citrinet_1024:1.5.0"

Conformer ASR German
- NeMo version (.nemo format) with:

    ngc registry model download-version "nvidia/nemo/stt_de_conformer_ctc_large:1.5.0_lm"

Inverse text normalization models: This model is an OpenFST finite state archive (.far) for use within the opensource Sparrowhawk normalization engine and Riva.

ngc registry model download-version "nvidia/tao/inverse_normalization_de_de:deployable_v1.0"

Language model: These models are simply 4-gram language models trained with Kneser-Ney smoothing using KenLM. This directory also contains the decoder dictionary used by the Flashlight decoder.

ngc registry model download-version "nvidia/tao/speechtotext_de_de_lm:deployable_v2.0"

Punctuation and capitalization model: Riva Punctuation and Capitalization model for German.

ngc registry model download-version "nvidia/tao/punctuationcapitalization_de_de_bert_base:trainable_v1.0"

Preparing Models#

NeMo to Riva Conversion#

Start the NeMo container, run:

docker run --rm -it $PWD/:/models nvcr.io/nvidia/nemo:22.01 bash

cd <RIVA_QUICKSTART_DIR>
pip3 install nvidia-pyindex
pip3 install nemo2riva-2.0.0-py3-none-any.whl

Convert the acoustic model to NeMo format.

nemo2riva --out /models/stt_de_citrinet_1024_v1.5.0/stt_de_citrinet_1024.riva /models/stt_de_citrinet_1024_v1.5.0/stt_de_citrinet_1024.nemo --max-dim=100000

Making Service#

The Riva ServiceMaker container is responsible for preparing models for deployment. Start an interactive session with:

docker pull nvcr.io/nvidia/riva/riva-speech:2.0.0-servicemaker
docker run --gpus all -it --rm \
     -v <RIVA_MODEL_DIR>:/servicemaker-dev \
     -v <RIVA_REPO_DIR>:/data \
     --entrypoint="/bin/bash" \
     nvcr.io/nvidia/riva/riva-speech:2.0.0-servicemaker

Build and Deploy an Offline ASR Pipeline#

The ASR pipeline including the acoustic model, language model, and inverse text normalization model is built as follows:

riva-build speech_recognition -f \
   /servicemaker-dev/citrinet-1024-de-DE-asr-offline.rmir /servicemaker-dev/stt_de_citrinet_1024_v1.5.0/stt_de_citrinet_1024.riva \
   --offline \
   --name=citrinet-1024-de-DE-asr-offline \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --chunk_size=900 \
   --left_padding_size=0. \
   --right_padding_size=0. \
   --decoder_type=flashlight \
   --decoding_language_model_binary=/servicemaker-dev/speechtotext_de_de_lm_vdeployable_v2.0/riva_de_asr_set_2.0_4gram.binary \
   --decoding_vocab=/servicemaker-dev/speechtotext_de_de_lm_vdeployable_v2.0/dict_vocab.txt \
   --flashlight_decoder.lm_weight=0.2 \
   --flashlight_decoder.word_insertion_score=0.2 \
   --flashlight_decoder.beam_threshold=20. \
   --wfst_tokenizer_model=/servicemaker-dev/inverse_normalization_de_de_vdeployable_v1.0/tokenize_and_classify.far \
   --wfst_verbalizer_model=/servicemaker-dev/inverse_normalization_de_de_vdeployable_v1.0/verbalize.far \
   --language_code=de-DE


riva-deploy -f /servicemaker-dev/citrinet-1024-de-DE-asr-offline.rmir /data/models

The riva-build command takes in an acoustic model in .riva format, the inverse text normalization models in .far format, and an n-gram binary language model file.

Note: Refer to the Riva ASR Pipeline Configuration documentation for build commands for streaming ASR services.

Build and Deploy and Punctuation and Capitalization Model#

When doing ASR, the Riva server looks for a punctuator model that matches the language in the ASR request config. The punctuator model can be built and deployed with:

riva-build punctuation -f \
   /servicemaker-dev/de_punctuation_1_0.rmir  \
   /servicemaker-dev/punctuationcapitalization_de_de_bert_base_vdeployable_v1.0/de_punctuation_1_0.riva \
   --language_code=de-DE

riva-deploy -f /servicemaker-dev/de_punctuation_1_0.rmir /data/models 

Start the Riva Server#

That concludes the building and deployment of the Riva German ASR service.

NVIDIA Riva

German ASR Pipeline Deployment

Contents