Deploying your own models

Pull the latest Jarvis service maker container from NGC:

docker pull nvcr.io/ea-jarvis-stage/jarvis-service-maker:v1.0.0b1-rc2

ASR

Export your TLT or Nemo model to .ejrvs format. Please refer to the TLT and Nemo documentation here: !!!!!!!!!!!TODO!!!!!!!!!

Launch an interactive session in the Jarvis servicemaker Docker container, mounting the directory or a Docker volume where the Jarvis .ejrvs and ASR language model files are stored.

docker run --gpus all -it --rm -v $DIR:/servicemaker-dev --entrypoint="/bin/bash" nvcr.io/ea-jarvis-stage/jarvis-service-maker:v1.0.0b1-rc2

From the Jarvis servicemaker container, first run the jarvis-build command to generate the Jarvis JMIR file which includes all the required parameters to deploy the ASR model. In the simplest use case, one can deploy the ASR model without any language model as follows:

jarvis-build speech_recognition /servicemaker-dev/<jmir_filename>:<encryption_key>  servicemaker-dev/<ejrvs_filename>:<encryption_key> --acoustic_model_name=<acoustic_model_name>

where <encryption_key> is the encryption key used during the export of the .ejrvs file, <acoustic_model_name> is the name is the acoustic model, which will be use to name the Jarvis ASR model in Triton, <ejrvs_filename> is the name of the ejrvs file to use as input, and <jmir_filename> is the Jarvis jmir file that will be generated. Upon succesful completion of this command, a file named <jmir_filename> will be created in the /servicemaker-dev/ folder. Since no language model are specified, the Jarvis greedy decoder will be used to predict the transcript based on the output of the acoustic model.

Language models

Jarvis ASR also supports decoding with a n-gram language model. The N-gram language model can be stored in a .arpa format, or a KenLM binary format. To prepare the Jarvis JMIR configuration using a N-gram LM stored in arpa format, one can use

jarvis-build speech_recognition /servicemaker-dev/<jmir_filename>:<encryption_key>  servicemaker-dev/<ejrvs_filename>:<encryption_key> --acoustic_model_name=<acoustic_model_name> --decoding_language_model_arpa=<arpa_filename>

To use Jarvis ASR with a KenLM binary file, generate the Jarvis JMIR with:

jarvis-build speech_recognition /servicemaker-dev/<jmir_filename>:<encryption_key>  servicemaker-dev/<ejrvs_filename>:<encryption_key> --acoustic_model_name=<acoustic_model_name> --decoding_language_model_binary=<KenLM_binary_filename>

Streaming/Offline configuration

By default, the Jarvis JMIR file is configured be used with the Jarvis StreamingRecognize RPC call, for streaming use case. To use the synchronous Recognize RPC call, one must generate the Jarvis JMIR file by adding the –offline option, for example:

jarvis-build speech_recognition /servicemaker-dev/<jmir_filename>:<encryption_key>  servicemaker-dev/<ejrvs_filename>:<encryption_key> --acoustic_model_name=<acoustic_model_name> --offline

Furthermore, the default streaming Jarvis JMIR configuration is to provide intermediate transcripts with very low latency. For use cases where being able to support more concurrent audio streams is more important, the following command should be used:

jarvis-build speech_recognition /servicemaker-dev/<jmir_filename>:<encryption_key>  servicemaker-dev/<ejrvs_filename>:<encryption_key> --acoustic_model_name=<acoustic_model_name>  --chunk_size=0.8 --padding_factor=2 --padding_size=0.8

Model repository generation

Use the JMIR file(s) generated in the previous step to generate the model repo for Triton deployment.

jarvis-deploy /servicemaker-dev/<jmir_filename> <triton_model_repo>

NLP

TODO

TTS

TODO