Setup a Speech AI Server with Speech Recognition and Text-to-speech Models

In this section, you will learn how to set up NVIDIA Riva Speech skills on NVIDIA Launchpad.

NVIDIA AI LaunchPad

Through LaunchPad, enterprises can get immediate, short-term access to NVIDIA AI running on private accelerated compute infrastructure to power critical AI initiatives. This platform speeds the development of complex AI models using NVIDIA DGX SuperPOD™, NVIDIA Base Command, NVIDIA Fleet Command, and pretrained models from NVIDIA NGC™.

NGC CLI

As a prerequisite for setting up Riva speech skills, you will first need access to NVIDIA GPU Cloud (NGC). After signing up, you can leverage the NGC command-line interface (CLI) to access the relevant resources. The NGC CLI provides an interface for many of the same operations that are available from the NGC website, such as accessing Docker repositories within your organization and team space.

Setup

Create your account and Login to NGC.
Access the LaunchPad VM console, click on “VM Console” on the left-hand navigation pane, and navigate to the newly created tab.

Download, unzip, and install from the command line by moving to a directory where you have execute permissions and using the command below.

Copy
Copied!

            
            wget -O ngccli_arm64.zip https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip -o ngccli_arm64.zip && chmod u+x ngc

Check the binary’s md5 hash to ensure the file wasn’t corrupted during download using the command below.
Copy

Copied!
```
            
            md5sum -c ngc.md5
        
```

Add your current directory to path using the command below.

Copy
Copied!

            
            echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profile

You must configure NGC CLI for your use so that you can run the commands. Using the command below, including your API key when prompted.
Copy

Copied!
```
            
            ngc config set
        
```

Setting up Riva Services using Quick Start Scripts

There are two options to deploy Riva: 1) Using Quick Start scripts to deploy Riva using Docker, and 2) Using the Helm chart available in NGC Helm Repository for a push-button deployment to a Kubernetes cluster. This section provides instructions for the former, using Quick Start scripts for local deployment on the LaunchPad VM.

Important

Run the following commands on VM Console.

Download Quick Start scripts through NGC CLI Tool by using the command below.

Copy
Copied!

            
            ngc registry resource download-version nvidia/riva/riva_quickstart:2.4.0

Initialize and Start Riva.

Navigate to your quickstart directory using the command below.

Copy
Copied!

            
            cd riva_quickstart_v2.4.0

Modify the config.sh with your preferred configuration. Options include which models to retrieve from NGC, where to store them, and which GPU to use if more than one is installed in your system.

Initialize Riva using the command below.

Copy
Copied!

            
            bash riva_init.sh

The riva_init.sh script downloads all required models and containers from NGC and generates the model repository. The download can take some time depending on your internet bandwidth. Upon successful completion of this command, you should see the output shown below.

Copy
Copied!

            
            Logging into NGC Docker registry if necessary...
Pulling required Docker images if necessary...
> Pulling Riva Speech Server images.
> Pulling nvcr.io/nvidia/riva/riva-speech:2.4.0. This may take some time...
Riva initialization complete. Run bash riva_start.sh to launch services.

After downloading the required models and containers, start the Riva Speech Service API server using the command below.

Copy
Copied!

            
            bash riva_start.sh

You should see similar output shown below.

Copy
Copied!

            
            Starting Riva Speech Services
> Waiting for Triton server to load all models...retrying in 10 seconds
> Waiting for Triton server to load all models...retrying in 10 seconds
> Waiting for Triton server to load all models...retrying in 10 seconds
> Triton server is ready…

To verify that the servers have started correctly, check that the output of docker logs riva-speech shows.

Copy
Copied!

            
            I0428 03:14:50.440943 1 riva_server.cc:66] TTS Server connected to Triton Inference Server at 0.0.0.0:8001
I0428 03:14:50.440943 1 riva_server.cc:66] NLP Server connected to Triton Inference Server at 0.0.0.0:8001
I0428 03:14:50.440951 1 riva_server.cc:68] ASR Server connected to Triton Inference Server at 0.0.0.0:8001
I0428 03:14:50.440955 1 riva_server.cc:71] Riva Conversational AI Server listening on 0.0.0.0:50051

The Riva server is all set up and ready to explore!

The best place to start is the Interact with Real-time Speech AI APIs guide, which provides an introduction to Riva’s Python API for accessing Riva services. For more information, refer to Riva Speech Skills documentation.

Stop Riva Services and Cleaning up

To shut down the Riva Services server containers use the command below.

Copy
Copied!

            
            bash riva_stop.sh

To clean-up the local Riva installation use the command below.

Copy
Copied!

            
            bash riva_clean.sh

This stops and removes all Riva-related containers, as well as deletes the Docker volume used to store model files. The Docker images themselves are not removed.