Deploying Your Custom Model into Riva¶
This section provides a brief overview on the two main tools used in the deployment process:
- The build phase using - riva-build.
- The deploy phase using - riva-deploy.
Build process¶
For your custom trained model, refer to the corresponding section (ASR, NLP, TTS) for your model type for the
riva-build phase. At the end of this phase, you’ll have the Riva Model Intermediate Representation (RMIR) archive
for your custom model.
Deploy process¶
At this point, you already have your Riva Model Intermediate Representation (RMIR) archive. Now, you have two options for deploying this RMIR.
Option 1: Use the Quick Start scripts (riva_init.sh and riva_start.sh) with the appropriate parameters in
config.sh.
Option 2: Manually run riva-deploy and then start riva-server with the target model repo.
Option 1: Using Quick Start Scripts to Deploy Your Models (Recommended path)¶
The Quick Start scripts (riva_init.sh and riva_start.sh) uses a particular directory for its operations. This directory
is defined by the variable $riva_model_loc specified in config.sh. By default, this is set to use a Docker volume, however,
you can specify any local directory to this variable.
By default, the riva_init.sh Quick Start script performs the following:
- Downloads the RMIRs defined and enabled in - config.shfrom NGC into a subdirectory at- $riva_model_loc, specifically- $riva_model_loc/rmir.
- Executes - riva-deployfor each of the RMIRs at- $riva_model_loc/rmirto generate their corresponding Triton Inference Server model repository at- $riva_model_loc/models.
When you execute riva_start.sh, it starts with the riva-speech container by mounting this $riva_model_loc directory
to /data inside the container.
To deploy your own custom RMIR, or set of RMIRs, you would simply need to place them inside the $riva_model_loc/rmir
directory. Ensure that you have defined a directory (that you have access to) in the $riva_model_loc variable in config.sh,
since you will need to copy over your RMIRs in its subdirectory. If the subdirectory $riva_model_loc/rmir does not exist, then you’d
need to create it and then copy your custom RMIRs there.
If you would like to skip the downloading of the default RMIRs from NGC, then you can set the variable $use_existing_rmirs to
true. After your custom RMIRs are inside this $riva_model_loc/rmir directory, you can run riva_init.sh which will
execute riva-deploy on your custom RMIRs along with any other RMIRs that are present on that directory and generate the Triton
Inferece Server model repo at $riva_model_loc/models.
Next, you can run riva_start.sh and it will start the riva-speech container
and load your custom models along with any other models that are present at $riva_model_loc/models. If you only want to load your
specific models, ensure that $riva_model_loc/models is empty or the /models directory is not present before you run
riva_init.sh. The script riva_init.sh creates the subdirectories /rmir and /models if they are not already there.
For more information about seeing logs and using client containers for testing
your models, refer to the Server Deployment > Local (Docker) section.
Option 2: Using riva-deploy and the Riva Speech Container (Advanced)¶
- Execute - riva-deploy. Refer to the Deploy section in- Services and Models > Overviewfor a brief overview on- riva-deploy.- The above command creates the Triton Inference Server model repository at - /data/models. If you want to write to any other location other than- /data/models, this will require additional manual changes in the embedded artifact directories within the configs within some of the Triton Inference Server model repositories that has model specific artifacts such as class labels. Therefore, stick with- /data/modelsunless you are familiar with Triton Inference Server Model repository configurations.
- Manually start the - riva-serverDocker container using- docker run.- After the Triton Inference Server model repository for your custom model is generated, start the Riva server on that target repo. The following command assumes you generated the model repo at - /data/models.- docker run -d --gpus 1 --init --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 \ -v /data:/data \ -p 50051 \ -e '\''CUDA_VISIBLE_DEVICES=0'\'' \ --name riva-speech \ riva-api \ start-riva --riva-uri=0.0.0.0:50051 --nlp_service=true --asr_service=true --tts_service=true- This command launches the Riva Speech Service API server similar to the Quick Start script - riva_start.sh.- Example output: - Starting Riva Speech Services > Waiting for Triton server to load all models...retrying in 10 seconds > Waiting for Triton server to load all models...retrying in 10 seconds > Waiting for Triton server to load all models...retrying in 10 seconds > Triton server is ready… 
- Verify that the servers have started correctly and check that the output of - docker logs riva-speechshows:- I0428 03:14:50.440943 1 riva_server.cc:66] TTS Server connected to Triton Inference Server at 0.0.0.0:8001 I0428 03:14:50.440943 1 riva_server.cc:66] NLP Server connected to Triton Inference Server at 0.0.0.0:8001 I0428 03:14:50.440951 1 riva_server.cc:68] ASR Server connected to Triton Inference Server at 0.0.0.0:8001 I0428 03:14:50.440955 1 riva_server.cc:71] Riva Conversational AI Server listening on 0.0.0.0:50051