Quick Start Guide#

This is the starting point to try out Riva. Specifically, this Quick Start Guide enables you to deploy pretrained models on a local workstation and run a sample client.

Riva Speech AI Skills supports two architectures, Linux x86_64 and Linux ARM64. These are referred to as data center (x86_64) and embedded (ARM64) throughout this documentation.

For more information and questions, visit the NVIDIA Riva Developer Forum.

Note

Riva embedded (ARM64) is in public beta.

Prerequisites#

Before using Riva Speech AI, ensure you meet the following prerequisites:

Data Center#

  1. You have access and are logged into NVIDIA NGC. For step-by-step instructions, refer to the NGC Getting Started Guide.

  2. You have access to an NVIDIA Volta™, NVIDIA Turing™, or an NVIDIA Ampere architecture-based A100 GPU. For more information, refer to the Support Matrix.

  3. You have Docker installed with support for NVIDIA GPUs. For more information, refer to the Support Matrix.

  4. Obtain a free trial license to install NVIDIA Riva. For more information, refer to the NVIDIA AI Enterprise Trial.

Embedded#

  1. You have access and are logged into NVIDIA NGC. For step-by-step instructions, refer to the NGC Getting Started Guide.

  2. You have access to an NVIDIA Jetson Orin, NVIDIA Jetson AGX Xavier™, or NVIDIA Jetson NX Xavier. For more information, refer to the Support Matrix.

  3. You have installed NVIDIA JetPack™ version 5.1 or 5.1.1 on the Jetson platform. For more information, refer to the Support Matrix.

  4. You have ~15 GB free disk space on Jetson as required by the default containers and models. If you are deploying any Riva model intermediate representation (RMIR) models, the additional disk space required is ~14 GB plus the size of the RMIR models.

  5. You have enabled the following power modes on the Jetson platform. These modes activate all CPU cores and clock the CPU/GPU at maximum frequency for achieving the best performance.

    sudo nvpmodel -m 0 (Jetson Orin AGX, mode MAXN)
    sudo nvpmodel -m 0 (Jetson Xavier AGX, mode MAXN)
    sudo nvpmodel -m 2 (Jetson Xavier NX, mode MODE_15W_6CORE)
    
  6. You have set the default runtime to nvidia on the Jetson platform by adding the following line in the /etc/docker/daemon.json file. Restart the Docker service using sudo systemctl restart docker after editing the file.

    "default-runtime": "nvidia"
    
  7. Obtain a free trial license to install NVIDIA Riva. For more information, refer to the NVIDIA AI Enterprise Trial.

Models Available for Deployment#

There are two push-button deployment options to deploy Riva Speech AI, which use pretrained models available from the NGC catalog:

Local Docker: You can use the Quick Start scripts to set up a local workstation and deploy the Riva services using Docker. Continue with this guide to use the Quick Start scripts.

Kubernetes: The Riva Helm Chart is designed to automate the steps for push-button deployment to a Kubernetes cluster. For more information, refer to Kubernetes deployment. This option is not supported for embedded.

In addition to using pretrained models, Riva Speech AI can run with fine-tune custom models using NVIDIA NeMo. Refer to the Model Development with NeMo section for details regarding the advanced option to create a model repository with NVIDIA NeMo.

Local Deployment Using Quick Start Scripts#

Riva includes Quick Start scripts to help you get started with Riva Speech AI Skills. These scripts are meant for deploying the services locally, for testing, and running the example applications.

  1. Download the scripts. Go to the Riva Quick Start for Data center or Embedded depending on the platform that you are using. Select the File Browser tab to download the scripts or use the NGC CLI tool to download from the command line.

    Data center

    ngc registry resource download-version nvidia/riva/riva_quickstart:2.15.0
    

    Embedded

    ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.15.0
    
  2. Initialize and start Riva. The initialization step downloads and prepares Docker images and models. The start script launches the server.

    Note

    This process can take up to an hour on an average internet connection. On the data center, each model is individually optimized for the target GPU after download. On embedded platforms, preoptimized models for the GPU on the NVIDIA Jetson are downloaded.

    Optional: Modify the config.sh file within the quickstart directory with your preferred configuration. Options include:

    • which services to enable

    • which models to retrieve from NGC

    • where to store them

    • which GPU to use if more than one is installed on your system (refer to Local (Docker) for more details)

    • locations of the SSL/TLS certificate

    • key files if using a secure connection

    Note

    For using the Riva translation services, refer to the Configure translation services instructions in the config.sh file within the quickstart directory.

    Data center

    cd riva_quickstart_v2.15.0
    

    Note

    If you are using a vGPU environment, set the parameter for enabling unified memory pciPassthru<vgpu-id>.cfg.enable_uvm to 1, where <vgpu-id> should be replaced by the vGPU-id assigned to a VM. For example, to enable unified memory for two vGPUs that are assigned to a VM, set pciPassthru0.cfg.enable_uvm and pciPassthru1.cfg.enable_uvm to 1. For more information, refer to the NVIDIA Virtual GPU Software User Guide.

    Embedded

    cd riva_quickstart_arm64_v2.15.0
    

    Note

    If you are using the Jetson AGX Xavier or the Jetson NX Xavier platform, set the $riva_tegra_platform variable to xavier in the config.sh file within the quickstart directory.

    To use a USB device for audio input/output, connect it to the Jetson platform so it gets auto mounted into the container.

    Initialize and start Riva

    bash riva_init.sh
    bash riva_start.sh
    
  3. Try walking through the different tutorials on GitHub. If running the Riva Quick Start scripts on a cloud service provider (such as AWS or GCP), ensure that your compute instance has an externally visible IP address. To run the tutorials, connect a browser window to the correct port (8888 by default) of that external IP address.

  4. Shut down the server when finished. After you have completed these steps and experimented with inferencing, run the riva_stop.sh script to stop the server.

For further details on how to customize a local deployment, refer to Local (Docker).

Transcribe Audio Files with Riva#

For Automatic Speech Recognition (ASR), run the following commands from inside the Riva client container data center or the Riva server container embedded to perform streaming and offline transcription of audio files. If using SSL/TLS, ensure to include the --ssl_server_cert /ssl/server.crt option.

  1. For Data center, issue the riva_start_client.sh script to start the client container with sample clients for each service. The script is located in the Quick Start folder (downloaded earlier in the Local Deployment using Quick Start Scripts section, step 1).

    bash riva_start_client.sh
    

    For Embedded, this step is not needed because the sample clients are already present in the Riva server container launched in the previous step.

  2. For offline recognition, run:

    riva_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav
    
  3. For streaming recognition, run:

    riva_streaming_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav
    

Synthesize Speech with Riva#

From within the Riva client container data center or the Riva server container embedded, run the following command to synthesize the audio files.

riva_tts_client --voice_name=English-US.Female-1 \
                --text="Hello, this is a speech synthesizer." \
                --audio_file=/opt/riva/wav/output.wav

The audio files are stored in the /opt/riva/wav directory.

The streaming API can be tested by using the command-line option --online=true. However, there is no difference between both options with the command-line client since it saves the entire audio to a .wav file.

Translate Text or Speech with Riva#

Translate Text-to-Text (T2T)#

From within the Riva client container data center or the Riva server container embedded, run the following command to perform a text-to-text translation from English to German.

riva_nmt_t2t_client --source_language_code="en-US" --target_language_code="de-DE" --text="This will become German words."

Translate Speech-to-Text (S2T)#

From within the Riva client container data center or the Riva server container embedded, run the following commands to perform a speech-to-text translation from English audio to German text.

riva_nmt_streaming_s2t_client --audio_file=/opt/riva/wav/en-US_sample.wav --source_language_code="en-US" --target_language_code="de-DE"

Translate Speech-to-Speech (S2S)#

From within the Riva client container data center or the Riva server container embedded, run the following commands to perform an speech-to-speech translation from Spanish audio to English audio.

riva_nmt_streaming_s2s_client --audio_file=/opt/riva/wav/es-US_sample.wav --source_language_code="es-US" --target_language_code="en-US"

Next Steps#

In this Quick Start Guide, you learned the basics of deploying the Riva server with pretrained models and using the API. Specifically, you:

  • Installed the Riva server and pretrained models

  • Walked through some tutorials to use the Riva API

  • Executed Riva command-line clients to transcribe audio (ASR), synthesize text-to-speech (TTS) and translate text or speech (NMT).

For more examples of how Riva Speech AI Skills can be used in real applications, you can follow the Virtual Assistant demo to see how the API is used. You can build your own speech AI applications with Riva using available APIs like gRPC, Python libraries, and command-line clients.

To learn more about Riva Speech AI Skills, visit the NVIDIA Riva Developer page.