TTS Deploy#

This tutorial explains the process of generating a TTS RMIR (Riva Model Intermediate Representation). A RMIR is an intermediate file that has all the necessary artifacts (models, files, configurations, and user settings) required to deploy a Riva service.

Learning Objectives#

In this tutorial, you will learn how to:

  • Use Riva ServiceMaker to take two .riva files and convert it to .rmir for either a AMD64 (data center, 86_64) or a ARM64 (embedded, AArch64) machine.

    • For users who have .nemo files, nemo2riva can be used to generate .riva files from .nemo checkpoints.

  • Launch and deploy the .rmir locally on the Riva server.

  • Send inference requests from a demo client using Riva API bindings.

Prerequisties#

To use this tutorial, ensure that you:

Riva ServiceMaker#

ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components:

  • riva-build

  • riva-deploy

The first step is riva-build, which can be run on either data center or embedded machines to build an .rmir file.

The second step is riva-deploy, which should be run on the machine that the Riva server is to be served on.

If you are building an .rmir file on a data center machine to target an embedded deployment, follow this tutorial up to and including the Riva-build section. Copy the built .rmir to the target embedded machine, run the set configs and params section, and continue to the Riva-deploy section.

Riva-build#

This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called a Riva Model Intermediate Representation (.rmir)) of an end-to-end pipeline for the supported services within Riva. Let’s consider two TTS models:

riva-build is responsible for the combination of one or more exported models (.riva files) into a single file containing an intermediate format called .rmir. This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. Refer to the Riva documentation for more information.

Riva-deploy#

The deployment tool takes as input one or more .rmir files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.


Set the Configurations and Parameters#

Update the parameters in the following code block:

  • machine_type: Type of machine the tutorial is being run on. Acceptable values are AMD64, ARM64_linux, ARM64_l4t. Defaults to AMD64.

  • target_machine: Type of machine the RMIR will be deployed on. Acceptable values are AMD64, ARM64_linux, ARM64_l4t. Defaults to AMD64.

  • acoustic_model: Full path for acoustic model .riva file. Defaults to None. This can be replaced with a custom acoustic model .riva checkpoint.

  • vocoder: Full path for vocoder .riva file. Defaults to None. This can be replaced with a custom vocoder .riva checkpoint.

  • out_dir: Directory to put the TTS.rmir file. The RMIR will be placed in ${out_dir}/RMIR/RMIR_NAME.rmir. Defaults to $pwd/out.

  • voice: Set the voice name of the model. Default to "test".

  • key: This is the encryption key used in nemo2riva. The same key will be used to deploy the RMIR generated in this tutorial. Defaults to tlt_encode.

  • use_ipa: Set to "y" or "Y" if the model uses IPA phones, "no" if the model uses ARPAbet. Defaults to "yes".

  • lang: Model language. This is only used for the client, and has no effect on generated speech. Defaults to "en-US".

  • sample_rate: Sample rate of generated audios in Hz. Defaults to 44100.

  • num_speakers: Number of speakers in the model. Defaults to 2, the number of speakers in the NGC example model.

import pathlib
import logging
import warnings

from version import __riva_version__

machine_type="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
target_machine="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
acoustic_model = None ##acoustic_model .riva location
vocoder = None ##vocoder .riva location
out_dir = pathlib.Path.cwd() / "out" ##Output directory to store the generated RMIR. The RMIR will be placed in `${out_dir}/RMIR/RMIR_NAME.rmir`.
voice = "test" ##Voice name
key = "tlt_encode" ##Encryption key used during nemo2riva
use_ipa = "yes" ##`"y"` or `"Y"` if the model uses `ipa`, no otherwise.
lang = "en-US" ##Language
sample_rate = 44100 ##Sample rate of the audios
num_speakers = 2 ## Number of speakers

riva_aux_files = None ##Riva model repo path. In the case of a custom model repo, change this to the full path of the custom Riva model repo.
riva_tn_files = None ##Riva model repo path. In the case of a custom model repo, change this to the full path of the custom Riva model repo.

## Riva NGC, servicemaker image config.
if machine_type.lower() in ["amd64", "arm64_linux"]:
    riva_init_image = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker"
elif machine_type.lower()=="arm64_l4t":
    riva_init_image = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker-l4t-aarch64"
rmir_dir = out_dir / "rmir"

if not out_dir.exists():
    out_dir.mkdir()
if not rmir_dir.exists():
    rmir_dir.mkdir()

def ngc_download_and_get_dir(ngc_resource_name, var, var_name, resource_type="model"):
    default_download_folder = "_v".join(ngc_resource_name.split("/")[-1].split(":"))
    !rm -rf ./riva_artifacts/{default_download_folder}
    ngc_output = !ngc registry {resource_type} download-version {ngc_resource_name} --dest riva_artifacts
    output = pathlib.Path(f"./riva_artifacts/{default_download_folder}")
    if not output.exists():
        ngc_output_formatted='\n'.join(ngc_output)
        logging.error(
            f"NGC was not able to download the requested model {ngc_resource_name}. "
            "Please check the NGC error message, removed all directories, and re-start the "
            f"notebook. NGC message: {ngc_output_formatted}"
        )
        return None
    if "model" in resource_type:
        riva_files_in_dir = list(output.glob("*.riva"))
        if len(riva_files_in_dir) > 0:
            output = riva_files_in_dir[0]
    if output is not None and var is not None:
        warnings.warn(
            f"`{var_name}` had a non-default value of `{var}`. `{var_name}` will be updated to `{var}`"
        )
    return output

Download models#

The following code block will download the default NGC models: FastPitch and HiFi-GAN. They will be downloaded to a folder called riva_artifacts. If a current folder already exists, it will be removed.

The code block can be skipped in case of custom models.

riva_ngc_artifacts = pathlib.Path.cwd() / "riva_artifacts"
if not riva_ngc_artifacts.exists():
    riva_ngc_artifacts.mkdir()

acoustic_model = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_fastpitch_ipa:deployable_v1.0", acoustic_model, "acoustic_model")
vocoder = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_hifigan_ipa:deployable_v1.0", vocoder, "vocoder")

The following code block will download some additional TTS files used for deployment. This will include the following files:

  • ARPAbet dictionary file

  • IPA dictionary file

  • abbreviation mapping file

  • two text normalization (TN) files

    • tokenize_and_classify.far

    • verbalize.far

riva_aux_files = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_auxiliary_files:deployable_v1.3", riva_aux_files, "riva_aux_files")
riva_tn_files = ngc_download_and_get_dir("nvidia/riva/normalization_en_us:deployable_v1.1", riva_tn_files, "riva_tn_files")

Run riva-build#

Stop running Docker, run riva_servicemaker, and run again with the necessary paths.

##Run the riva servicemaker.
!docker stop riva_rmir_gen &> /dev/null
!set -x && docker run -td --gpus all --rm -v {str(riva_aux_files.resolve())}:/riva_aux \
            -v {str(acoustic_model.parent.resolve())}/:/synt \
            -v {str(vocoder.parent.resolve())}:/voc -v {str(riva_tn_files.resolve())}:/riva_tn \
            -v {str(rmir_dir.resolve())}:/data --name riva_rmir_gen --entrypoint="/bin/bash" {riva_init_image}
Using --force tag in riva-build this will replace any existing RMIR.
warnings.warn("Using --force in riva-build will replace any existing RMIR.")
riva_build=(
    f"riva-build speech_synthesis --force --voice_name={voice} --language_code={lang} "
    f"--sample_rate={sample_rate} /data/FastPitch_HifiGan.rmir:{key} /synt/{str(acoustic_model.name)}:{key} "
    f"/voc/{str(vocoder.name)}:{key} --abbreviations_file=/riva_aux/abbr.txt "
    f"--wfst_tokenizer_model=/riva_tn/tokenize_and_classify.far --wfst_verbalizer_model=/riva_tn/verbalize.far"
)
if target_machine=="arm":
    riva_build += """--max_batch_size 1 --postprocessor.max_batch_size 1 --preprocessor.max_batch_size 1 \
                --encoderFastPitch.max_batch_size 1 --chunkerFastPitch.max_batch_size 1 --hifigan.max_batch_size 1"""
if use_ipa.lower() in ["y", "yes"]:
    riva_build+=" --phone_set=ipa --phone_dictionary_file=/riva_aux/ipa_cmudict-0.7b_nv22.08.txt --upper_case_chars=True"
else:
    riva_build+=" --phone_set=arpabet --phone_dictionary_file=/riva_aux/cmudict-0.7b_nv22.08"
if num_speakers > 1:
    riva_build+=f" --num_speakers={num_speakers}"
    riva_build+=" --subvoices " + ",".join([f"{i}:{i}" for i in range(num_speakers)])
print(riva_build)

Execute the riva build command and stop the riva_servicemaker container.

!docker exec riva_rmir_gen {riva_build}
!docker stop riva_rmir_gen

Run riva-deploy#

So far in this tutorial, we have learned how to generate RMIR files from .riva files. We would see that a FastPitch_HifiGan.rmir has been generated in the ${out_dir}/rmir location we defined earlier.

The RMIR file generated in this tutorial can be deployed using riva_quickstart.

Steps to deploy the RMIR#

  • Download the Riva Quick Start resource

  • Open config.sh and update the following params:

    • set service_enabled_asr to false.

    • set service_enabled_nlp to false.

    • set service_enabled_tts to true.

    • riva_model_loc to the location of your out_dir.

    • set use_existing_rmirs to true.

  • run riva_init.sh.

  • run riva_start.sh.

Let’s download the Riva Quick Start resource from NGC.

if target_machine.lower() in ["amd64", "arm64_linux"]:
    quickstart_link = f"nvidia/riva/riva_quickstart:{__riva_version__}"
else:
    quickstart_link = f"nvidia/riva/riva_quickstart_arm64:{__riva_version__}"

quickstart_dir = ngc_download_and_get_dir(quickstart_link, None, None, resource_type="resource")

Next, we modify the config.sh file to enable the relevant Riva services (TTS in this case for FastPitch and HiFi-GAN), and provide the encryption key and path to the model repository (riva_model_loc) generated in the previous step.

For example, if the above model repository is generated at ${out_dir}/rmir, then you can specify riva_model_loc as the same directory as ${out_dir}/rmir

Here is how the config.sh should look:

config.sh snippet#

# Enable or Disable Riva Services 
service_enabled_asr=false                                                      ## MAKE CHANGES HERE  
service_enabled_nlp=false                                                      ## MAKE CHANGES HERE  
service_enabled_tts=true                                                     ## MAKE CHANGES HERE  

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"                                                  ## MAKE CHANGES HERE

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with MODEL_LOC)    

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false                                ## MAKE CHANGES HERE (Set to true)

Let’s make the necessary changes to the config.sh.

with open(f"{quickstart_dir}/config.sh", "r") as config_in:
    config_file = config_in.readlines()

for i, line in enumerate(config_file):
    # Disable services
    if "service_enabled_asr" in line:
        config_file[i] = "service_enabled_asr=false\n"
    elif "service_enabled_nlp" in line:
        config_file[i] = "service_enabled_nlp=false\n"
    elif "service_enabled_nmt" in line:
        config_file[i] = "service_enabled_nmt=false\n"
    elif "service_enabled_tts" in line:
        config_file[i] = "service_enabled_tts=true\n"
    # Update riva_model_loc to our rmir folder
    elif "riva_model_loc" in line:
        config_file[i] = config_file[i].split("riva_model_loc")[0]+f"riva_model_loc={out_dir}\n"
    elif "use_existing_rmirs" in line:
        config_file[i] = "use_existing_rmirs=true\n"
    elif "MODEL_DEPLOY_KEY" in line:
        config_file[i] = f"MODEL_DEPLOY_KEY=\"{key}\"\n"
    elif "fastpitch" in line:
        config_file[i] = f"#{line}"

with open(f"{quickstart_dir}/config.sh", "w") as config_in:
    config_in.writelines(config_file)

print("".join(config_file))
# Ensure you have permission to execute these scripts
! cd {quickstart_dir} && chmod +x ./riva_init.sh && chmod +x ./riva_start.sh && chmod +x ./riva_stop.sh
! cd {quickstart_dir} && ./riva_stop.sh config.sh
# Run `riva_init.sh`. This will fetch the containers/models and run `riva-deploy`.
# YOU CAN SKIP THIS STEP IF YOU DID RIVA DEPLOY
! cd {quickstart_dir} && ./riva_init.sh config.sh
# Run `riva_start.sh`. This will start the Riva server and serve your model.
! cd {quickstart_dir} && ./riva_start.sh config.sh

Run Inference#

Once the Riva server is up and running with your models, you can send inference requests querying the server.

To send gRPC requests, install the Riva Python API bindings for the client.

# Install client API bindings
! pip install nvidia-riva-client

Connect to the Riva server and run inference#

Now, we can query the Riva server; let’s get started. The following cell queries the Riva server (using gRPC) to yield a result.

import os
import riva.client
import IPython.display as ipd
import numpy as np

server = "localhost:50051"                # location of riva server
auth = riva.client.Auth(uri=server)
tts_service = riva.client.SpeechSynthesisService(auth)


text = "Is it recognize speech or wreck a nice beach?"
language_code = lang                   # currently required to be "en-US"
sample_rate_hz = sample_rate                    # the desired sample rate
voice_name = voice      # subvoice to generate the audio output.
data_type = np.int16                      # For RIVA version < 1.10.0 please set this to np.float32

resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz)
audio = resp.audio
meta = resp.meta
processed_text = meta.processed_text
predicted_durations = meta.predicted_durations

audio_samples = np.frombuffer(resp.audio, dtype=data_type)
print(processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)