Gst-nvdsasr

The Gst-nvdsasr plugin performs automatic speech recognition (ASR) on the input audio data. The plugin provides a mechanism to load custom ASR low level library at runtime. It is supported on both x86 and Jetson platforms and can be used on x86, Jetson devices or from inside DeepStream dockers. A custom library libnvds_riva_asr_grpc.so is provided which uses gRPC APIs to access the Riva ASR service. The library communicates with the ASR service of the NVIDIA Riva SDK for speech recognition and punctuation-capitalization using optimized Riva models.

Note

  • The DS-Riva ASR library, libnvds_riva_asr_grpc.so, uses gRPC APIs to access the Riva ASR service. The Riva ASR service should be started before using this library. Required steps are outlined below in section ‘Riva ASR Service Depoyment’.

  • Installation of the gRPC C++ libraries (v1.38) is required on the client side. Required steps are outlined below in section ‘gRPC C++ Library Installation’.

Note

The libnvds_riva_asr_grpc.so library works with NVIDIA Riva Release 1.5.0 Beta or later.

The plugin accepts raw PCM audio GStreamer buffers (GstBuffer) from upstream component. It transforms audio into generic text GstBuffer output.

Model needs raw audio data input with S16LE (Signed 16bit Little Endian). Library settings can be configured via YAML format file (by setting a property on Gst-nvdsasr plugin) which has multi-part settings for plugin.

As shown in the diagram below input S16LE raw audio data is preprocessed and inferred by the Riva ASR service . The final output is available in UTF8 text.

Gst-Nvdsasr

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvdsasr plugin with the gRPC based ASR library.

  • Input

    • Raw Audio GStreamer buffers

  • Control parameters

    • customlib-name: Set a custom ASR library that the plugin loads to perform inference. Use : libnvds_riva_asr_grpc.so

    • create-speech-ctx-func: Symbol name to create ASR speech context. Use : create_riva_asr_grpc_ctx

    • config-file: A text file to configure the plugin. Use riva_asr_grpc_conf.yml

  • Outputs

    • Text GStreamer buffer containing ASR output

Features

The following table summarizes the features of the plugin.

Gst-nvdsasr plugin features

Feature

Description

Release

Speech ASR template

The plugin is a ASR speech base which can support custom ASR library loading in runtime

DS 6.0

Live stream transcription

Support partial transcript output in realtime

DS 6.0

Final transcription

Support final transcription only useful for local audio streams

DS 6.0

Languages support

The plugin is currently only tested for English (en-US)

DS 6.0

Words punctuation

Support words punctuation and capitalization

DS 6.0

Custom library with gRPC API implementation

Supports custom library implementation that uses gRPC APIs for accessing Riva ASR gRPC service. Set libnvds_riva_asr_grpc.so as customlib-name and create_riva_asr_grpc_ctx as create-speech-ctx-func

DS 6.0

x86 platform support

DS 6.0

Jetson platform support

DS 6.2

DS-Riva ASR Library YAML File Configuration Specifications

DS-Riva ASR configuration file uses YAML 1.2 file format: https://yaml.org/spec/1.2/spec.html.

  • There are multiple parts in the config file. An example for the gRPC riva_asr_grpc_conf.yml yml file is located at /opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_tts_app/. Each part has a name indicating a unique part name and a detail indicating the setting details.

  • name: riva_server part configures Riva ASR server settings in its corresponding node detail:.

  • name: riva_model part configures Riva ASR model entry in its corresponding node detail:.

  • name: riva_asr_stream part configures Riva low level library supported features in its corresponding node detail:. Each ASR plugin instance will launch a standalone Riva stream. The settings between different plugin instances could be different.

  • name: ds_riva_asr_plugin part configures DS-Riva ASR settings in its corresponding node detail:.

  • A separator line with --- is inserted between the 2 neighbor parts according to YAML specification.

Gst Properties

The following tables describes the Gst properties of the Gst-nvdsasr plugin.

riva_server Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

name: riva_server

detail

Node for Riva Server Setting details

Node

detail: server_uri: “localhost:50051”

server_uri

Part of detail node. Specify Riva ASR service address. Used in case of gRPC APIs.

String

server_uri: “localhost:50051”

riva_model Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: riva_model

detail

Node for Riva model setting details

Node

detail: model_name: citrinet-1024-asr-trt-ensemble-vad-streaming

model_name

Part of detail node. Specify which model entry is used

String

model_name: citrinet-1024-asr-trt-ensemble-vad-streaming

ds_riva_asr_stream Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: riva_asr_stream

detail

Node for Riva ASR Steam setting details

Node

detail: encoding: LINEAR_PCM …

encoding

Part of detail node. Specify Input data format Only Value LINEAR_PCM is supported

String

encoding: LINEAR_PCM

sample_rate_hertz

Part of detail node. Input audio sample rate Only Value 16000 is supported

Integer & >0

sample_rate_hertz: 16000

language_code

Part of detail node. Specify which language is used for recognition Only Value en-US is supported

String

language_code: en-US

max_alternatives

Part of detail node. Max alternatives selected by top confidence Only 1 is supported at present

Integer & >0

max_alternatives: 1

enable_automatic_punctuation

Part of detail node. Enable automatic punctuation or not

Boolean

enable_automatic_punctuation: false

ds_riva_asr_plugin Configuration properties for DS-Riva ASR library settings

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: ds_riva_asr_plugin

detail

Node DS-Riva ASR library details

Node

detail: final_only: false

final_only

Part of detail node. Specify whether final transcriptions only or with partial transcription output together

Boolean

final_only: false

enable_text_pts

Part of detail node. Specify whether text buffer timestamp is enabled or not.

Boolean

enable_text_pts: false

use_riva_pts

Part of detail node. Specify whether time informatation provided by Riva service is used to calculate the timestamp and duration of output buffer. Note: At present this option is supported for non-live sources only

Boolean

use_riva_pts: false

force_final_trailing

Part of detail node. Enable insertion of new line character after the final transcription

Boolean

force_final_trailing: false

Riva ASR Service Deployment

Please check https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html for the steps to deploy the models using Riva Quick start scripts:

Example steps to deploy Riva server with desired ASR model:

  1. Download Riva Quick Start package:

    $ ngc registry resource download-version nvidia/riva/riva_quickstart:1.5.0-beta
    $ cd riva_quickstart_v1.5.0-beta
    
  2. Update config.sh file for required ASR model e.g CitriNet-1024:

    service_enabled_asr=true
    service_enabled_nlp=false
    service_enabled_tts=false
    
    riva_model_loc="riva-asr-model-repo"
    
    models_asr=(
    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset1p7_streaming:${riva_ngc_model_version}"
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"
    )
    
  3. Run the Riva initialization script:

    $ bash riva_init.sh
    
  4. [Optional] Deploying ASR models from NVIDIA TAO Toolkit using Riva (e.g. Jasper model) Please refer https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/asr-python-advanced-finetune-am-citrinet-tao-deployment.html for the steps to deploy ASR models from TAO Toolkit.

    Example steps:

    Download jasper_asr_SET_1pt2_nr.riva file:

    $ wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/speechtotext_english_jasper/versions/deployable_v1.2/files/jasper_asr_SET_1pt2_nr.riva
    

    Copy the jasper_asr_SET_1pt2_nr.riva file to the models repository. This needs root privilege if a docker volume is used as models repository.:

    $ sudo su
    $ cp <directory path of downloaded .riva file>/jasper_asr_SET_1pt2_nr.riva  /var/lib/docker/volumes/riva-asr-model-repo/_data/
    $ exit
    

    Set below environment variables:

    $ export RIVA_SM_CONTAINER="nvcr.io/nvidia/riva/riva-speech:1.5.0-beta-servicemaker"
    $ export MODEL_LOC="riva-asr-model-repo"
    $ export MODEL_NAME="jasper_asr_SET_1pt2_nr.riva"
    $ export KEY="tlt_encode"
    

    Build the docker image:

    $ sudo docker pull $RIVA_SM_CONTAINER
    

    Build Riva ASR model in streaming mode:

    $ sudo docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER riva-build speech_recognition /data/asr.rmir:$KEY /data/$MODEL_NAME:$KEY --decoder_type=greedy
    

    Deploy Riva model in streaming mode:

    $ sudo docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER riva-deploy -f /data/asr.rmir:$KEY /data/models/
    

    With above steps, Jasper models are downloaded at /var/lib/docker/volumes/riva-asr-model-repo/_data/models/.

  5. Deploy the Riva ASR service:

    $ bash riva_start.sh
    

    To stop ASR services after the application has run successfully, run the following command:

    $ bash riva_stop.sh
    

gRPC C++ Library Installation

gRPC C++ shared libraries v1.38 installation is needed for the DS-Riva ASR library to access the Riva ASR gRPC service. To install the libraries, please follow steps given at https://grpc.io/docs/languages/cpp/quickstart/ , and add -DBUILD_SHARED_LIBS=ON to the cmake build options. (Recommended to use make -j4 instead of make -j)

Or

Use the included script to install gRPC C++ libraries, this scripts performs same steps:

$ cd /opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_app
$ sudo chmod +x gRPC_installation.sh
$ ./gRPC_installation.sh

Please run below command to add the installation path to the LD_LIBRARY_PATH environment variable:

$ export LD_LIBRARY_PATH=$HOME/.local/lib:$LD_LIBRARY_PATH

The gRPC C++ libraries are pre-installed on the DeepStream dGPU docker images. In the dGPU docker container, please run below command to add the installation path to the LD_LIBRARY_PATH environment variable:

$ export LD_LIBRARY_PATH=$HOME/.local/lib:$LD_LIBRARY_PATH

Sample Test Application

For information about Gst-nvdsasr sample tests, please see source code under directory sources/apps/audio_apps/deepstream_asr_app. Follow README to run the sample tests.