Gst-nvdsasr¶
The Gst-nvdsasr
plugin performs automatic speech recognition (ASR) on input audio data. Currently it is supported on x86 platform only.
It uses optimized Riva models for ASR and punctuation-capitalization.
With this plugin, the Riva ASR service can be accessed via Riva gRPC APIs by selecting corresponding low level library using the plugin properties.
The plugin provides a mechanism to load custom ASR low level library at runtime.
A custom library libnvds_riva_asr_grpc
is implemented which uses gRPC APIs, to access Riva ASR service.
Note
DS-Riva ASR libnvds_riva_asr_grpc
library uses gRPC APIs to access the Riva ASR service. The Riva ASR service should be started before using this library. And the gRPC C++ installation is required on the client side. Required steps are mentioned in section “Riva ASR model data generation and gRPC installation” below.
The plugin accepts raw PCM audio Gst
Buffers from upstream component. It transforms audio into generic text Gst
Buffer output.
Model needs raw audio data input with S16LE (Signed 16bit Little Endian). Library settings can be configured via YAML format file (by setting a property on Gst-nvdsasr
plugin) which has multi-part settings for plugin.
As shown in the diagram below input S16LE raw audio data is preprocessed and inferred by the Riva ASR service . The final output is available in UTF8 text.
Inputs and Outputs¶
This section summarizes the inputs, outputs, and communication facilities of the Gst-nvdsasr
plugin with ASR library (gRPC based).
Input
Raw Audio
Gst
Buffers
Control parameters
customlib-name
: Set a custom ASR library that the plugin loads to perform inference. Use :libnvds_riva_asr_grpc.so
create-speech-ctx-func
: Symbol name to create ASR speech context. Use :create_riva_asr_grpc_ctx
config-file
: A text file to configure the plugin. Useriva_asr_grpc_conf.yml
Outputs
Gst
Text Buffer containing ASR output
Features¶
The following table summarizes the features of the plugin.
Feature |
Description |
Release |
---|---|---|
Speech ASR template |
The plugin is a ASR speech base which can support custom ASR library loading in runtime |
DS 6.0 |
Live stream transcription |
Support partial transcript output in realtime |
DS 6.0 |
Final transcription |
Support final transcription only useful for local audio streams |
DS 6.0 |
Languages support |
English is supported at present |
DS 6.0 |
Words punctuation |
Support words punctuation and capitalization |
DS 6.0 |
Custom library with gRPC API implementation |
Supports custom library implementation that uses gRPC APIs for accessing Riva ASR gRPC service. Set |
DS 6.0 |
DS-Riva ASR Yaml File Configuration Specifications¶
DS-Riva ASR configuration file uses YAML 1.2 file format: https://yaml.org/spec/1.2/spec.html.
There are multiple parts in the config file. An example for the gRPC
riva_asr_grpc_conf.yml
yml file is located at/opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_tts_app/
. Each part has aname
indicating a unique part name and adetail
indicating the setting details.name: riva_server
part configures Riva ASR server settings in its corresponding nodedetail:
.name: riva_model
part configures Riva ASR model entry in its corresponding nodedetail:
.name: riva_asr_stream
part configures Riva low level library supported features in its corresponding nodedetail:
. Each ASR plugin instance will launch a standalone Riva stream. The settings between different plugin instances could be different.name: ds_riva_asr_plugin
part configures DS-Riva ASR settings in its corresponding nodedetail:
.A separator line with
---
is inserted between the 2 neighbor parts according to YAML specification.
Gst Properties¶
The following tables describes the Gst
properties of the Gst-nvdsasr
plugin.
Property |
Meaning |
Type and Range |
Example Notes |
---|---|---|---|
name |
Unique name |
String |
name: riva_server |
detail |
Node for Riva Server Setting details |
Node |
detail: server_uri: “localhost:50051” |
server_uri |
Part of detail node. Specify Riva ASR service address. Used in case of gRPC APIs. |
String |
server_uri: “localhost:50051” |
Property |
Meaning |
Type and Range |
Example Notes |
---|---|---|---|
name |
Unique name |
String |
Must be name: riva_model |
detail |
Node for Riva model setting details |
Node |
detail: model_name: citrinet-1024-asr-trt-ensemble-vad-streaming |
model_name |
Part of detail node. Specify which model entry is used |
String |
model_name: citrinet-1024-asr-trt-ensemble-vad-streaming |
Property |
Meaning |
Type and Range |
Example Notes |
---|---|---|---|
name |
Unique name |
String |
Must be name: riva_asr_stream |
detail |
Node for Riva ASR Steam setting details |
Node |
detail: encoding: LINEAR_PCM … |
encoding |
Part of detail node. Specify Input data format Only Value LINEAR_PCM is supported |
String |
encoding: LINEAR_PCM |
sample_rate_hertz |
Part of detail node. Input audio sample rate Only Value 16000 is supported |
Integer & >0 |
sample_rate_hertz: 16000 |
language_code |
Part of detail node. Specify which language is used for recognition Only Value en-US is supported |
String |
language_code: en-US |
max_alternatives |
Part of detail node. Max alternatives selected by top confidence Only 1 is supported at present |
Integer & >0 |
max_alternatives: 1 |
enable_automatic_punctuation |
Part of detail node. Enable automatic punctuation or not |
Boolean |
enable_automatic_punctuation: false |
Property |
Meaning |
Type and Range |
Example Notes |
---|---|---|---|
name |
Unique name |
String |
Must be name: ds_riva_asr_plugin |
detail |
Node DS-Riva ASR library details |
Node |
detail: final_only: false |
final_only |
Part of detail node. Specify whether final transcriptions only or with partial transcription output together |
Boolean |
final_only: false |
enable_text_pts |
Part of detail node. Specify whether text buffer timestamp is enabled or not. |
Boolean |
enable_text_pts: false |
use_riva_pts |
Part of detail node. Specify whether time informatation provided by Riva service is used to calculate the timestamp and duration of output buffer. Note: At present this option is supported for non-live sources only |
Boolean |
use_riva_pts: false |
force_final_trailing |
Part of detail node. Enable insertion of new line character after the final transcription |
Boolean |
force_final_trailing: false |
Riva ASR model data generation and gRPC installation¶
Follow the NVIDIA Riva user guide to generate ASR related models offline. You only need to generate this once. When you get the access permission, follow instructions below:
Make sure Riva ASR model repository is already generated. If not generated, follow the steps below:
Check that all prerequisites are met. See https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#prerequisites
Follow the QuickStart instructions for local deployment. Refer to the https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts
Download riva_quickstart scripts:
$ ngc registry resource download-version nvidia/riva/riva_quickstart:x.x.x-tagUse Riva Speech Skills 1.5.0-beta release onwards.
For Riva Speech Skills 1.5.0-beta release use
$ngc registry resource download-version nvidia/riva/riva_quickstart:1.5.0-beta
:$cd riva_quickstart_v1.5.0-betaMake the following changes to the config.sh file, to disable other Riva services :
service_enabled_asr=true service_enabled_nlp=false service_enabled_tts=false riva_model_loc="riva-asr-model-repo" models_asr=( "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset1p7_streaming:${riva_ngc_model_version}" "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}" )Run
$ bash riva_init.sh
to generate docker volumeriva-asr-model-repo
.
Additional Steps to download and deploy TAO ASR models from NGC:
Refer to the README of deepstream-avysnc-app, available at path:
/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-avsync
Download and deploy Jasper models, follow section 2.b of avsync app README.
gRPC installation and prerequisites:
Install gRPC, follow section 2.c of avsync app README.
Run ASR service and set
LD_LIBRARY_PATH
, refer sections 2.d and 2.e of avysnc app README.Note
In case docker volume
riva-asr-model-repo
is corrupted, user need rundocker volume rm riva-asr-model-repo
before generate again.
Verify docker volume
riva-asr-model-repo
available. Use$ docker volume inspect riva-asr-model-repo
to inspect volume.Run Riva ASR service using
riva_start.sh
.
This plugin which uses gRPC APIs can be used on x86 or inside DeepStream docker also. gRPC C++ libraries are already installed on the DeepStream docker images
To run DeepStream docker:
$export DISPLAY=:0
$xhost +
$ sudo docker run --rm -it --gpus '"'device=0'"' -v riva-asr-model-repo:/data -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --net=host $DS_Docker
DS_Docker
is the docker image name for DeepStream build.Set
LD_LIBRARY_PATH
using$source ~/.profile
before executing an application which uses Riva ASR services.
Note
The libnvds_riva_asr_grpc.so
library works with Riva Speech Skills 1.5.0 Beta release or later.
For more information about Gst-nvdsasr
sample tests, please see source code under directory sources/apps/audio_apps/deepstream_asr_app.
Follow README
to run tests.