Speech Recognition
Contents
Speech Recognition#
Local Deployment using Quick Start Scripts#
Riva includes Quick Start scripts to help you get started with Riva Speech AI Skills. These scripts are meant for deploying the services locally, for testing, and running the example applications.
Download the scripts. Go to the Riva Quick Start for Data center or Embedded depending on the platform that you are using. Select the File Browser tab to download the scripts or use the NGC CLI tool to download from the command line.
Data center
ngc registry resource download-version nvidia/riva/riva_quickstart:2.19.0
Embedded
ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.19.0
Configure RIVA deployment: Modify the
config.sh
file within thequickstart
directory with your preferred configuration.Description
Parameter in config.sh
Services to enable
service_enabled_asr
,
service_enabled_tts
,
service_enabled_nmt
Models to retrieve from NGC
asr_acoustic_model
,
asr_language_code
,
tts_language_code
Model storage location
riva_model_loc
(default value:riva-model-repo
docker volume. Can be set to a local directory in host machine if needed)GPU selection for multi-GPU systems
gpus_to_use
(refer to Local (Docker) for more details)SSL/TLS certificate, key file location
ssl_server_cert
,
ssl_server_key
,
ssl_root_cert
,
ssl_use_mutual_auth
Initialize and start Riva. The initialization step downloads and prepares Docker images and models. The start script launches the server.
Note
This process can take up to an hour on an average internet connection. On the data center, each model is individually optimized for the target GPU after they have been downloaded. On embedded platforms, preoptimized models for the GPU on the NVIDIA Jetson are downloaded.
Data center
cd riva_quickstart_v2.19.0
Note
If you are using a vGPU environment, set the parameter for enabling unified memory
pciPassthru<vgpu-id>.cfg.enable_uvm
to 1, where<vgpu-id>
should be replaced by the vGPU-id assigned to a VM. For example, to enable unified memory for two vGPUs that are assigned to a VM, setpciPassthru0.cfg.enable_uvm
andpciPassthru1.cfg.enable_uvm
to 1. For more information, refer to the NVIDIA Virtual GPU Software User Guide.Embedded
cd riva_quickstart_arm64_v2.19.0
Note
If you are using the Jetson AGX Xavier or the Jetson NX Xavier platform, set the
$riva_tegra_platform
variable toxavier
in theconfig.sh
file within thequickstart
directory.To use a USB device for audio input/output, connect it to the Jetson platform so it gets automatically mounted into the container.
Initialize and start Riva
bash riva_init.sh bash riva_start.sh
Try walking through the different tutorials on GitHub. If running the Riva Quick Start scripts on a cloud service provider (such as AWS or GCP), ensure that your compute instance has an externally visible IP address. To run the tutorials, connect a browser window to the correct port (
8888
by default) of that external IP address.Shut down the server when finished. After you have completed these steps and experimented with inferencing, run the
riva_stop.sh
script to stop the server.
For further details on how to customize a local deployment, refer to Local (Docker).
If using SSL/TLS, ensure to include the options described in this section to enable the secure deployment of the Riva server.
Transcribe Audio Files with Riva#
For Data center, issue the
riva_start_client.sh
script to start the client container with sample clients for each service. The script is located in the Quick Start folder, which you downloaded earlier in step 1 of Local Deployment using Quick Start Scripts.bash riva_start_client.sh
For Embedded, this step is not needed because the sample clients are already present in the Riva server container launched in the previous step.
Run the following command to list the available models and choose the model to use.
For offline recognition, run the below command.
riva_asr_client --list_models
For streaming recognition, run the below command.
riva_streaming_asr_client --list_models
Note
By default, the RIVA server deploys English (US) Conformer model optimized for both low latency and offline transcription. You can customize this configuration by adjusting the following parameters in
config.sh
:use_asr_streaming_throughput_mode
: Toggle between latency and throughput optimizationasr_language_code
: Select the language for transcription (e.g. “en-US”, “es-US”)asr_acoustic_model
: Choose the model architecture (e.g. “conformer”, “parakeet-1.1b”)
If you have set the following parameters in
config.sh
,service_enabled_asr=true asr_language_code="en-US" asr_acoustic_model="parakeet-1.1b"
the above command should display the following output:
$riva_streaming_asr_client --list_models 'en-US': 'parakeet-1.1b-en-us-asr-streaming-asr-bls-ensemble'
Note
Verify that your language code is supported by the selected model.
If your desired language code is not supported by available models, modify the
asr_acoustic_model
orasr_language_code
parameter in theconfig.sh
file and execute theriva_init.sh
script again to download and initialize the correct model.Use
--model_name
flag in below command to pass the model if multiple models are available. RIVA server will randomly select one of the available models if the model name is not specified.
Run the below command to perform offline transcription of an audio file.
riva_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav
For streaming recognition, run the below command to perform streaming transcription of an audio file.
riva_streaming_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav
Next Steps#
In this Quick Start Guide, you learned the basics of deploying the Riva server with pretrained models and using the API. Specifically, you:
Installed the Riva server and pretrained models
Walked through some tutorials to use the Riva API
Executed Riva command-line clients to transcribe audio (ASR).
For more examples of how you can use Riva Speech AI Skills in real applications, follow the tutorials in GitHub. Additionally, you can build your own speech AI applications with Riva using available APIs like gRPC, Python libraries, and command-line clients.
To learn more about Riva Speech AI Skills, visit the NVIDIA Riva Developer page.