Gst-nvdsasr¶

The Gst-nvdsasr plugin performs automatic speech recognition (ASR) on input audio data. Currently it is supported on x86 platform only. It uses optimized Riva models for ASR and punctuation-capitalization. With this plugin, the Riva ASR service can be accessed via Riva gRPC APIs by selecting corresponding low level library using the plugin properties. The plugin provides a mechanism to load custom ASR low level library at runtime. A custom library libnvds_riva_asr_grpc is implemented which uses gRPC APIs, to access Riva ASR service.

Note

DS-Riva ASR libnvds_riva_asr_grpc library uses gRPC APIs to access the Riva ASR service. The Riva ASR service should be started before using this library. And the gRPC C++ installation is required on the client side. Required steps are mentioned in section “Riva ASR model data generation and gRPC installation” below.

The plugin accepts raw PCM audio Gst Buffers from upstream component. It transforms audio into generic text Gst Buffer output.

Model needs raw audio data input with S16LE (Signed 16bit Little Endian). Library settings can be configured via YAML format file (by setting a property on Gst-nvdsasr plugin) which has multi-part settings for plugin.

As shown in the diagram below input S16LE raw audio data is preprocessed and inferred by the Riva ASR service . The final output is available in UTF8 text.

Inputs and Outputs¶

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvdsasr plugin with ASR library (gRPC based).

Input
- Raw Audio Gst Buffers
Control parameters
- customlib-name: Set a custom ASR library that the plugin loads to perform inference. Use : libnvds_riva_asr_grpc.so
- create-speech-ctx-func: Symbol name to create ASR speech context. Use : create_riva_asr_grpc_ctx
- config-file: A text file to configure the plugin. Use riva_asr_grpc_conf.yml
Outputs
- Gst Text Buffer containing ASR output

Features¶

The following table summarizes the features of the plugin.

Gst-nvdsasr plugin features¶
Feature	Description	Release
Speech ASR template	The plugin is a ASR speech base which can support custom ASR library loading in runtime	DS 6.0
Live stream transcription	Support partial transcript output in realtime	DS 6.0
Final transcription	Support final transcription only useful for local audio streams	DS 6.0
Languages support	English is supported at present	DS 6.0
Words punctuation	Support words punctuation and capitalization	DS 6.0
Custom library with gRPC API implementation	Supports custom library implementation that uses gRPC APIs for accessing Riva ASR gRPC service. Set `libnvds_riva_asr_grpc.so` as `customlib-name` and `create_riva_asr_grpc_ctx` as `create-speech-ctx-func`	DS 6.0

DS-Riva ASR Yaml File Configuration Specifications¶

DS-Riva ASR configuration file uses YAML 1.2 file format: https://yaml.org/spec/1.2/spec.html.

There are multiple parts in the config file. An example for the gRPC riva_asr_grpc_conf.yml yml file is located at /opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_tts_app/. Each part has a name indicating a unique part name and a detail indicating the setting details.
name: riva_server part configures Riva ASR server settings in its corresponding node detail:.
name: riva_model part configures Riva ASR model entry in its corresponding node detail:.
name: riva_asr_stream part configures Riva low level library supported features in its corresponding node detail:. Each ASR plugin instance will launch a standalone Riva stream. The settings between different plugin instances could be different.
name: ds_riva_asr_plugin part configures DS-Riva ASR settings in its corresponding node detail:.
A separator line with --- is inserted between the 2 neighbor parts according to YAML specification.

Gst Properties¶

The following tables describes the Gst properties of the Gst-nvdsasr plugin.

riva_server Configuration properties for Riva low level library¶
Property	Meaning	Type and Range	Example Notes
name	Unique name	String	name: riva_server
detail	Node for Riva Server Setting details	Node	detail: server_uri: “localhost:50051”
server_uri	Part of detail node. Specify Riva ASR service address. Used in case of gRPC APIs.	String	server_uri: “localhost:50051”

riva_model Configuration properties for Riva low level library¶
Property	Meaning	Type and Range	Example Notes
name	Unique name	String	Must be name: riva_model
detail	Node for Riva model setting details	Node	detail: model_name: citrinet-1024-asr-trt-ensemble-vad-streaming
model_name	Part of detail node. Specify which model entry is used	String	model_name: citrinet-1024-asr-trt-ensemble-vad-streaming

ds_riva_asr_stream Configuration properties for Riva low level library¶
Property	Meaning	Type and Range	Example Notes
name	Unique name	String	Must be name: riva_asr_stream
detail	Node for Riva ASR Steam setting details	Node	detail: encoding: LINEAR_PCM …
encoding	Part of detail node. Specify Input data format Only Value LINEAR_PCM is supported	String	encoding: LINEAR_PCM
sample_rate_hertz	Part of detail node. Input audio sample rate Only Value 16000 is supported	Integer & >0	sample_rate_hertz: 16000
language_code	Part of detail node. Specify which language is used for recognition Only Value en-US is supported	String	language_code: en-US
max_alternatives	Part of detail node. Max alternatives selected by top confidence Only 1 is supported at present	Integer & >0	max_alternatives: 1
enable_automatic_punctuation	Part of detail node. Enable automatic punctuation or not	Boolean	enable_automatic_punctuation: false

ds_riva_asr_plugin Configuration properties for DS-Riva ASR library settings¶
Property	Meaning	Type and Range	Example Notes
name	Unique name	String	Must be name: ds_riva_asr_plugin
detail	Node DS-Riva ASR library details	Node	detail: final_only: false
final_only	Part of detail node. Specify whether final transcriptions only or with partial transcription output together	Boolean	final_only: false
enable_text_pts	Part of detail node. Specify whether text buffer timestamp is enabled or not.	Boolean	enable_text_pts: false
use_riva_pts	Part of detail node. Specify whether time informatation provided by Riva service is used to calculate the timestamp and duration of output buffer. Note: At present this option is supported for non-live sources only	Boolean	use_riva_pts: false
force_final_trailing	Part of detail node. Enable insertion of new line character after the final transcription	Boolean	force_final_trailing: false

Riva ASR model data generation and gRPC installation¶

Follow the NVIDIA Riva user guide to generate ASR related models offline. You only need to generate this once. When you get the access permission, follow instructions below:

Make sure Riva ASR model repository is already generated. If not generated, follow the steps below:

Check that all prerequisites are met. See https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#prerequisites

Follow the QuickStart instructions for local deployment. Refer to the https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts
Download riva_quickstart scripts:
$ ngc registry resource download-version nvidia/riva/riva_quickstart:x.x.x-tag
Use Riva Speech Skills 1.5.0-beta release onwards.
For Riva Speech Skills 1.5.0-beta release use $ngc registry resource download-version nvidia/riva/riva_quickstart:1.5.0-beta:
$cd riva_quickstart_v1.5.0-beta
Make the following changes to the config.sh file, to disable other Riva services :
service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false

riva_model_loc="riva-asr-model-repo"

 models_asr=(
   "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset1p7_streaming:${riva_ngc_model_version}"
   "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"
  )
Run $ bash riva_init.sh to generate docker volume riva-asr-model-repo.
Additional Steps to download and deploy TAO ASR models from NGC:

Refer to the README of deepstream-avysnc-app, available at path: /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-avsync

Download and deploy Jasper models, follow section 2.b of avsync app README.

gRPC installation and prerequisites:

Install gRPC, follow section 2.c of avsync app README.

Run ASR service and set LD_LIBRARY_PATH, refer sections 2.d and 2.e of avysnc app README.

Note

In case docker volume riva-asr-model-repo is corrupted, user need run docker volume rm riva-asr-model-repo before generate again.

Verify docker volume riva-asr-model-repo available. Use $ docker volume inspect riva-asr-model-repo to inspect volume.
Run Riva ASR service using riva_start.sh.

This plugin which uses gRPC APIs can be used on x86 or inside DeepStream docker also. gRPC C++ libraries are already installed on the DeepStream docker images

To run DeepStream docker:
$export DISPLAY=:0
$xhost + $ sudo docker run --rm -it --gpus '"'device=0'"' -v riva-asr-model-repo:/data -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --net=host $DS_Docker DS_Docker is the docker image name for DeepStream build.

Set LD_LIBRARY_PATH using $source ~/.profile before executing an application which uses Riva ASR services.

Note

The libnvds_riva_asr_grpc.so library works with Riva Speech Skills 1.5.0 Beta release or later.

For more information about Gst-nvdsasr sample tests, please see source code under directory sources/apps/audio_apps/deepstream_asr_app. Follow README to run tests.