TAO - TTS FastPitch/HiFi-GAN Riva Deployment
TAO - TTS FastPitch/HiFi-GAN Riva Deployment#
Train Adapt Optimize (TAO) Toolkit provides the capability to export your model in a format that can be deployed using NVIDIA Riva, a highly performant application framework for multi-modal conversational AI services using GPUs.
This tutorial explores taking 2
.riva models, the result of
tao spectro_gen and
tao vocoder commands, and leveraging the Riva ServiceMaker framework to aggregate all the necessary artifacts for the Riva deployment to a target environment. Once the models are deployed in Riva, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is.
In this tutorial, you will learn how to:
use Riva ServiceMaker to take a TAO exported .riva and convert it to .rmir.
deploy the model(s) locally on the Riva server.
send inference requests from a demo client using Riva API bindings.
To follow along, ensure you:
have access to NVIDIA NGC and are able to download the Riva Quick Start resources
.rivamodel file that you want to deploy. You can obtain this from
tao <task> export(with
export_format=RIVA). Refer to the Text to Speech tutorial on Speech Synthesis using Train Adapt Optimize (TAO) Toolkit for more details on training and exporting a
ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components:
This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called a Riva Model Intermediate Representation (.rmir)) of an end-to-end pipeline for the supported services within Riva. Let’s consider two TTS models:
riva-build is responsible for the combination of one or more exported models (
.riva files) into a single file
containing an intermediate format called
.rmir. This file contains a
deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the
final deployment and inference. Refer to the Riva documentation for more information.
The deployment tool takes as input one or more
.rmir files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for
the execution and finally writes all those assets to the output model repository directory.
For the purpose of this tutorial, we will only be using the
# IMPORTANT: UPDATE THESE PATHS # ServiceMaker Docker RIVA_SM_CONTAINER = "<add container name>" # Directory where the .riva models are stored $MODEL_LOC/*.riva # Both the FastPitch_22k_LJS.riva and HifiGAN_22k_LJS.riva models should be present MODEL_LOC = "<add path to model location>" # Name of the .riva file SPECTRO_GEN_MODEL_NAME = "<add model name>" VOCODER_MODEL_NAME = "<add model name>" # Key that model is encrypted with, while exporting with TAO KEY = "<add encryption key used for trained model>"
# Download the auxillary files for RIVA to help enhance the quality of the audio output. !ngc registry model download-version "nvidia/tao/speechsynthesis_en_us_auxiliary_files:deployable_v1.0" --dest $MODEL_LOC
# Get the ServiceMaker docker ! docker pull $RIVA_SM_CONTAINER
# For a multi-speaker model, please un-comment the command below and run the following command. ! mkdir -p $MODEL_LOC/rmir ! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER \ riva-build speech_synthesis /data/rmir/new_speaker.rmir:$KEY \ /data/$SPECTRO_GEN_MODEL_NAME:$KEY \ /data/$VOCODER_MODEL_NAME:$KEY \ --voice_name=new_speaker \ --subvoices=ljspeech:0,new_voice:1 \ --abbreviations_file=/data/speechsynthesis_en_us_auxiliary_files_vdeployable_v1.0/abbr.txt \ --arpabet_file=/data/speechsynthesis_en_us_auxiliary_files_vdeployable_v1.0/cmudict-0.7b-nv0.01
Start Riva Server#
Once the model repository is generated, we are ready to start the Riva server. From this step onwards you need to download the Riva QuickStart Resource from NGC. Set the path to the directory here:
# Set the Riva QuickStart directory RIVA_DIR = "<Path to the uncompressed folder downloaded from quickstart(include the folder name)>"
Next, we modify the
config.sh file to enable the relevant Riva services (TTS in this case for FastPitch and HiFi-GAN), and provide the encryption key and path to the model repository (riva_model_loc) generated in the previous step.
For example, if the above model repository is generated at
$MODEL_LOC/models, then you can specify
riva_model_loc as the same directory as
Pretrained versions of models specified in
models_asr/nlp/tts are fetched from NGC. Since we are using our custom model, we can comment it out in
models_tts (and any others that are not relevant to the use case).
# Enable or Disable Riva Services service_enabled_asr=false ## MAKE CHANGES HERE service_enabled_nlp=false ## MAKE CHANGES HERE service_enabled_tts=true ## MAKE CHANGES HERE # Specify one or more GPUs to use # specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours. gpus_to_use="device=0" # Specify the encryption key to use to deploy models MODEL_DEPLOY_KEY="tlt_encode" ## MAKE CHANGES HERE # Locations to use for storing models artifacts # # If an absolute path is specified, the data will be written to that location # Otherwise, a docker volume will be used (default). # # riva_init.sh will create a `rmir` and `models` directory in the volume or # path specified. # # RMIR ($riva_model_loc/rmir) # Riva uses an intermediate representation (RMIR) for models # that are ready to deploy but not yet fully optimized for deployment. Pretrained # versions can be obtained from NGC (by specifying NGC models below) and will be # downloaded to $riva_model_loc/rmir by `riva_init.sh` # # Custom models produced by NeMo or TAO and prepared using riva-build # may also be copied manually to this location $(riva_model_loc/rmir). # # Models ($riva_model_loc/models) # During the riva_init process, the RMIR files in $riva_model_loc/rmir # are inspected and optimized for deployment. The optimized versions are # stored in $riva_model_loc/models. The riva server exclusively uses these # optimized versions. riva_model_loc="<add path>" ## MAKE CHANGES HERE (Replace with MODEL_LOC) # The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory # If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc # then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom # RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the # below flag to deploy them all together. use_existing_rmirs=false ## MAKE CHANGES HERE (Set to true)
# Ensure you have permission to execute these scripts ! cd $RIVA_DIR && chmod +x ./riva_init.sh && chmod +x ./riva_start.sh
# Run Riva Init. This will fetch the containers/models # YOU CAN SKIP THIS STEP IF YOU DID RIVA DEPLOY ! cd $RIVA_DIR && ./riva_init.sh config.sh
# Run Riva Start. This will deploy your model(s). ! cd $RIVA_DIR && ./riva_start.sh config.sh
Once the Riva server is up and running with your models, you can send inference requests querying the server.
To send gRPC requests, install the Riva Python API bindings for the client.
# Install client API bindings ! pip install nvidia-riva-client
Connect to the Riva server and run inference#
Now, we can query the Riva server; let’s get started. The following cell queries the Riva server (using gRPC) to yield a result.
import os import soundfile import riva.client import IPython.display as ipd import numpy as np server = "localhost:50051" # location of riva server auth = riva.client.Auth(uri=server) tts_service = riva.client.SpeechSynthesisService(auth) text = "Is it recognize speech or wreck a nice beach?" language_code = "en-US" # currently required to be "en-US" sample_rate_hz = 22050 # the desired sample rate voice_name = "new_speaker.new_voice" # subvoice to generate the audio output. data_type = np.int16 # For RIVA version < 1.10.0 please set this to np.float32 resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz) audio = resp.audio meta = resp.meta processed_text = meta.processed_text predicted_durations = meta.predicted_durations audio_samples = np.frombuffer(resp.audio, dtype=data_type) print(processed_text) ipd.Audio(audio_samples, rate=sample_rate_hz)
You can stop all Docker containers before shutting down the Jupyter kernel.
Caution: The following command will stop all running containers
! docker stop $(docker ps -a -q)