TTS Deploy
Contents
TTS Deploy#
This tutorial explains the process of generating a TTS RMIR (Riva Model Intermediate Representation). A RMIR is an intermediate file that has all the necessary artifacts (models, files, configurations, and user settings) required to deploy a Riva service.
Learning Objectives#
In this tutorial, you will learn how to:
Use Riva ServiceMaker to take two
.riva
files and convert it to.rmir
for either aAMD64
(data center,86_64
) or aARM64
(embedded,AArch64
) machine.For users who have
.nemo
files,nemo2riva
can be used to generate.riva
files from.nemo
checkpoints.
Launch and deploy the
.rmir
locally on the Riva server.Send inference requests from a demo client using Riva API bindings.
Prerequisties#
To use this tutorial, ensure that you:
Have access to NGC through the NGC Command-Line Interface (CLI).
Riva ServiceMaker#
ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components:
riva-build
riva-deploy
The first step is riva-build
, which can be run on either data center or embedded machines to build an .rmir
file.
The second step is riva-deploy
, which should be run on the machine that the Riva server is to be served on.
If you are building an .rmir
file on a data center machine to target an embedded deployment, follow this tutorial up to and including the Riva-build section. Copy the built .rmir
to the target embedded machine, run the set configs and params section, and continue to the Riva-deploy section.
Riva-build#
This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called a Riva Model Intermediate Representation (.rmir
)) of an end-to-end pipeline for the supported services within Riva. Let’s consider two TTS models:
riva-build
is responsible for the combination of one or more exported models (.riva
files) into a single file
containing an intermediate format called .rmir
. This file contains a
deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the
final deployment and inference. Refer to the Riva documentation for more information.
Riva-deploy#
The deployment tool takes as input one or more .rmir
files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for
the execution and finally writes all those assets to the output model repository directory.
Set the Configurations and Parameters#
Update the parameters in the following code block:
machine_type
: Type of machine the tutorial is being run on. Acceptable values areAMD64
,ARM64_linux
,ARM64_l4t
. Defaults toAMD64
.target_machine
: Type of machine the RMIR will be deployed on. Acceptable values areAMD64
,ARM64_linux
,ARM64_l4t
. Defaults toAMD64
.acoustic_model
: Full path for acoustic model.riva
file. Defaults toNone
. This can be replaced with a custom acoustic model.riva
checkpoint.vocoder
: Full path for vocoder.riva
file. Defaults toNone
. This can be replaced with a custom vocoder.riva
checkpoint.out_dir
: Directory to put theTTS.rmir
file. The RMIR will be placed in${out_dir}/RMIR/RMIR_NAME.rmir
. Defaults to$pwd/out
.voice
: Set the voice name of the model. Default to"test"
.key
: This is the encryption key used innemo2riva
. The same key will be used to deploy the RMIR generated in this tutorial. Defaults totlt_encode
.use_ipa
: Set to"y"
or"Y"
if the model uses IPA phones,"no"
if the model uses ARPAbet. Defaults to"yes"
.lang
: Model language. This is only used for the client, and has no effect on generated speech. Defaults to"en-US"
.sample_rate
: Sample rate of generated audios in Hz. Defaults to 44100.num_speakers
: Number of speakers in the model. Defaults to 2, the number of speakers in the NGC example model.
import pathlib
import logging
import warnings
from version import __riva_version__
machine_type="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
target_machine="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
acoustic_model = None ##acoustic_model .riva location
vocoder = None ##vocoder .riva location
out_dir = pathlib.Path.cwd() / "out" ##Output directory to store the generated RMIR. The RMIR will be placed in `${out_dir}/RMIR/RMIR_NAME.rmir`.
voice = "test" ##Voice name
key = "tlt_encode" ##Encryption key used during nemo2riva
use_ipa = "yes" ##`"y"` or `"Y"` if the model uses `ipa`, no otherwise.
lang = "en-US" ##Language
sample_rate = 44100 ##Sample rate of the audios
num_speakers = 2 ## Number of speakers
riva_aux_files = None ##Riva model repo path. In the case of a custom model repo, change this to the full path of the custom Riva model repo.
riva_tn_files = None ##Riva model repo path. In the case of a custom model repo, change this to the full path of the custom Riva model repo.
## Riva NGC, servicemaker image config.
if machine_type.lower() in ["amd64", "arm64_linux"]:
riva_init_image = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker"
elif machine_type.lower()=="arm64_l4t":
riva_init_image = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker-l4t-aarch64"
rmir_dir = out_dir / "rmir"
if not out_dir.exists():
out_dir.mkdir()
if not rmir_dir.exists():
rmir_dir.mkdir()
def ngc_download_and_get_dir(ngc_resource_name, var, var_name, resource_type="model"):
default_download_folder = "_v".join(ngc_resource_name.split("/")[-1].split(":"))
!rm -rf ./riva_artifacts/{default_download_folder}
ngc_output = !ngc registry {resource_type} download-version {ngc_resource_name} --dest riva_artifacts
output = pathlib.Path(f"./riva_artifacts/{default_download_folder}")
if not output.exists():
ngc_output_formatted='\n'.join(ngc_output)
logging.error(
f"NGC was not able to download the requested model {ngc_resource_name}. "
"Please check the NGC error message, removed all directories, and re-start the "
f"notebook. NGC message: {ngc_output_formatted}"
)
return None
if "model" in resource_type:
riva_files_in_dir = list(output.glob("*.riva"))
if len(riva_files_in_dir) > 0:
output = riva_files_in_dir[0]
if output is not None and var is not None:
warnings.warn(
f"`{var_name}` had a non-default value of `{var}`. `{var_name}` will be updated to `{var}`"
)
return output
Download models#
The following code block will download the default NGC models: FastPitch and HiFi-GAN. They will be downloaded to a folder called riva_artifacts
. If a current folder already exists, it will be removed.
The code block can be skipped in case of custom models.
riva_ngc_artifacts = pathlib.Path.cwd() / "riva_artifacts"
if not riva_ngc_artifacts.exists():
riva_ngc_artifacts.mkdir()
acoustic_model = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_fastpitch_ipa:deployable_v1.0", acoustic_model, "acoustic_model")
vocoder = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_hifigan_ipa:deployable_v1.0", vocoder, "vocoder")
The following code block will download some additional TTS files used for deployment. This will include the following files:
ARPAbet dictionary file
IPA dictionary file
abbreviation mapping file
two text normalization (TN) files
tokenize_and_classify.far
verbalize.far
riva_aux_files = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_auxiliary_files:deployable_v1.3", riva_aux_files, "riva_aux_files")
riva_tn_files = ngc_download_and_get_dir("nvidia/riva/normalization_en_us:deployable_v1.1", riva_tn_files, "riva_tn_files")
Run riva-build#
Stop running Docker, run riva_servicemaker
, and run again with the necessary paths.
##Run the riva servicemaker.
!docker stop riva_rmir_gen &> /dev/null
!set -x && docker run -td --gpus all --rm -v {str(riva_aux_files.resolve())}:/riva_aux \
-v {str(acoustic_model.parent.resolve())}/:/synt \
-v {str(vocoder.parent.resolve())}:/voc -v {str(riva_tn_files.resolve())}:/riva_tn \
-v {str(rmir_dir.resolve())}:/data --name riva_rmir_gen --entrypoint="/bin/bash" {riva_init_image}
warnings.warn("Using --force in riva-build will replace any existing RMIR.")
riva_build=(
f"riva-build speech_synthesis --force --voice_name={voice} --language_code={lang} "
f"--sample_rate={sample_rate} /data/FastPitch_HifiGan.rmir:{key} /synt/{str(acoustic_model.name)}:{key} "
f"/voc/{str(vocoder.name)}:{key} --abbreviations_file=/riva_aux/abbr.txt "
f"--wfst_tokenizer_model=/riva_tn/tokenize_and_classify.far --wfst_verbalizer_model=/riva_tn/verbalize.far"
)
if target_machine=="arm":
riva_build += """--max_batch_size 1 --postprocessor.max_batch_size 1 --preprocessor.max_batch_size 1 \
--encoderFastPitch.max_batch_size 1 --chunkerFastPitch.max_batch_size 1 --hifigan.max_batch_size 1"""
if use_ipa.lower() in ["y", "yes"]:
riva_build+=" --phone_set=ipa --phone_dictionary_file=/riva_aux/ipa_cmudict-0.7b_nv22.08.txt --upper_case_chars=True"
else:
riva_build+=" --phone_set=arpabet --phone_dictionary_file=/riva_aux/cmudict-0.7b_nv22.08"
if num_speakers > 1:
riva_build+=f" --num_speakers={num_speakers}"
riva_build+=" --subvoices " + ",".join([f"{i}:{i}" for i in range(num_speakers)])
print(riva_build)
Execute the riva build command and stop the riva_servicemaker container.
!docker exec riva_rmir_gen {riva_build}
!docker stop riva_rmir_gen
Run riva-deploy#
So far in this tutorial, we have learned how to generate RMIR files from .riva files. We would see that a FastPitch_HifiGan.rmir
has been generated in the ${out_dir}/rmir
location we defined earlier.
The RMIR file generated in this tutorial can be deployed using riva_quickstart.
Steps to deploy the RMIR#
Download the Riva Quick Start resource
Open
config.sh
and update the following params:set
service_enabled_asr
tofalse
.set
service_enabled_nlp
tofalse
.set
service_enabled_tts
totrue
.riva_model_loc
to the location of yourout_dir
.set
use_existing_rmirs
totrue
.
run
riva_init.sh
.run
riva_start.sh
.
Let’s download the Riva Quick Start resource from NGC.
if target_machine.lower() in ["amd64", "arm64_linux"]:
quickstart_link = f"nvidia/riva/riva_quickstart:{__riva_version__}"
else:
quickstart_link = f"nvidia/riva/riva_quickstart_arm64:{__riva_version__}"
quickstart_dir = ngc_download_and_get_dir(quickstart_link, None, None, resource_type="resource")
Next, we modify the config.sh
file to enable the relevant Riva services (TTS in this case for FastPitch and HiFi-GAN), and provide the encryption key and path to the model repository (riva_model_loc) generated in the previous step.
For example, if the above model repository is generated at ${out_dir}/rmir
, then you can specify riva_model_loc
as the same directory as ${out_dir}/rmir
Here is how the config.sh
should look:
config.sh snippet#
# Enable or Disable Riva Services
service_enabled_asr=false ## MAKE CHANGES HERE
service_enabled_nlp=false ## MAKE CHANGES HERE
service_enabled_tts=true ## MAKE CHANGES HERE
# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"
# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode" ## MAKE CHANGES HERE
# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified.
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
#
# Custom models produced by NeMo and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>" ## MAKE CHANGES HERE (Replace with MODEL_LOC)
# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false ## MAKE CHANGES HERE (Set to true)
Let’s make the necessary changes to the config.sh
.
with open(f"{quickstart_dir}/config.sh", "r") as config_in:
config_file = config_in.readlines()
for i, line in enumerate(config_file):
# Disable services
if "service_enabled_asr" in line:
config_file[i] = "service_enabled_asr=false\n"
elif "service_enabled_nlp" in line:
config_file[i] = "service_enabled_nlp=false\n"
elif "service_enabled_nmt" in line:
config_file[i] = "service_enabled_nmt=false\n"
elif "service_enabled_tts" in line:
config_file[i] = "service_enabled_tts=true\n"
# Update riva_model_loc to our rmir folder
elif "riva_model_loc" in line:
config_file[i] = config_file[i].split("riva_model_loc")[0]+f"riva_model_loc={out_dir}\n"
elif "use_existing_rmirs" in line:
config_file[i] = "use_existing_rmirs=true\n"
elif "MODEL_DEPLOY_KEY" in line:
config_file[i] = f"MODEL_DEPLOY_KEY=\"{key}\"\n"
elif "fastpitch" in line:
config_file[i] = f"#{line}"
with open(f"{quickstart_dir}/config.sh", "w") as config_in:
config_in.writelines(config_file)
print("".join(config_file))
# Ensure you have permission to execute these scripts
! cd {quickstart_dir} && chmod +x ./riva_init.sh && chmod +x ./riva_start.sh && chmod +x ./riva_stop.sh
! cd {quickstart_dir} && ./riva_stop.sh config.sh
# Run `riva_init.sh`. This will fetch the containers/models and run `riva-deploy`.
# YOU CAN SKIP THIS STEP IF YOU DID RIVA DEPLOY
! cd {quickstart_dir} && ./riva_init.sh config.sh
# Run `riva_start.sh`. This will start the Riva server and serve your model.
! cd {quickstart_dir} && ./riva_start.sh config.sh
Run Inference#
Once the Riva server is up and running with your models, you can send inference requests querying the server.
To send gRPC requests, install the Riva Python API bindings for the client.
# Install client API bindings
! pip install nvidia-riva-client
Connect to the Riva server and run inference#
Now, we can query the Riva server; let’s get started. The following cell queries the Riva server (using gRPC) to yield a result.
import os
import riva.client
import IPython.display as ipd
import numpy as np
server = "localhost:50051" # location of riva server
auth = riva.client.Auth(uri=server)
tts_service = riva.client.SpeechSynthesisService(auth)
text = "Is it recognize speech or wreck a nice beach?"
language_code = lang # currently required to be "en-US"
sample_rate_hz = sample_rate # the desired sample rate
voice_name = voice # subvoice to generate the audio output.
data_type = np.int16 # For RIVA version < 1.10.0 please set this to np.float32
resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz)
audio = resp.audio
meta = resp.meta
processed_text = meta.processed_text
predicted_durations = meta.predicted_durations
audio_samples = np.frombuffer(resp.audio, dtype=data_type)
print(processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)