Gst-nvdsasr

The Gst-nvdsasr plugin performs automatic speech recognition (ASR) on input audio data. Currently it is supported on x86 platform only. It uses optimized Riva models for ASR and punctuation-capitalization. With this plugin, the Riva ASR service can be accessed via Riva gRPC APIs by selecting corresponding low level library using the plugin properties. The plugin provides a mechanism to load custom ASR low level library at runtime. A custom library libnvds_riva_asr_grpc is implemented which uses gRPC APIs, to access Riva ASR service.

Note

DS-Riva ASR libnvds_riva_asr_grpc library uses gRPC APIs to access the Riva ASR service. The Riva ASR service should be started before using this library. And the gRPC C++ installation is required on the client side. Required steps are mentioned in section “Riva ASR model data generation and gRPC installation” below.

The plugin accepts raw PCM audio Gst Buffers from upstream component. It transforms audio into generic text Gst Buffer output.

Model needs raw audio data input with S16LE (Signed 16bit Little Endian). Library settings can be configured via YAML format file (by setting a property on Gst-nvdsasr plugin) which has multi-part settings for plugin.

As shown in the diagram below input S16LE raw audio data is preprocessed and inferred by the Riva ASR service . The final output is available in UTF8 text.

Gst-Nvdsasr

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvdsasr plugin with ASR library (gRPC based).

  • Input

    • Raw Audio Gst Buffers

  • Control parameters

    • customlib-name: Set a custom ASR library that the plugin loads to perform inference. Use : libnvds_riva_asr_grpc.so

    • create-speech-ctx-func: Symbol name to create ASR speech context. Use : create_riva_asr_grpc_ctx

    • config-file: A text file to configure the plugin. Use riva_asr_grpc_conf.yml

  • Outputs

    • Gst Text Buffer containing ASR output

Features

The following table summarizes the features of the plugin.

Gst-nvdsasr plugin features

Feature

Description

Release

Speech ASR template

The plugin is a ASR speech base which can support custom ASR library loading in runtime

DS 6.0

Live stream transcription

Support partial transcript output in realtime

DS 6.0

Final transcription

Support final transcription only useful for local audio streams

DS 6.0

Languages support

English is supported at present

DS 6.0

Words punctuation

Support words punctuation and capitalization

DS 6.0

Custom library with gRPC API implementation

Supports custom library implementation that uses gRPC APIs for accessing Riva ASR gRPC service. Set libnvds_riva_asr_grpc.so as customlib-name and create_riva_asr_grpc_ctx as create-speech-ctx-func

DS 6.0

DS-Riva ASR Yaml File Configuration Specifications

DS-Riva ASR configuration file uses YAML 1.2 file format: https://yaml.org/spec/1.2/spec.html.

  • There are multiple parts in the config file. An example for the gRPC riva_asr_grpc_conf.yml yml file is located at /opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_tts_app/. Each part has a name indicating a unique part name and a detail indicating the setting details.

  • name: riva_server part configures Riva ASR server settings in its corresponding node detail:.

  • name: riva_model part configures Riva ASR model entry in its corresponding node detail:.

  • name: riva_asr_stream part configures Riva low level library supported features in its corresponding node detail:. Each ASR plugin instance will launch a standalone Riva stream. The settings between different plugin instances could be different.

  • name: ds_riva_asr_plugin part configures DS-Riva ASR settings in its corresponding node detail:.

  • A separator line with --- is inserted between the 2 neighbor parts according to YAML specification.

Gst Properties

The following tables describes the Gst properties of the Gst-nvdsasr plugin.

riva_server Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

name: riva_server

detail

Node for Riva Server Setting details

Node

detail: server_uri: “localhost:50051”

server_uri

Part of detail node. Specify Riva ASR service address. Used in case of gRPC APIs.

String

server_uri: “localhost:50051”

riva_model Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: riva_model

detail

Node for Riva model setting details

Node

detail: model_name: citrinet-1024-asr-trt-ensemble-vad-streaming

model_name

Part of detail node. Specify which model entry is used

String

model_name: citrinet-1024-asr-trt-ensemble-vad-streaming

ds_riva_asr_stream Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: riva_asr_stream

detail

Node for Riva ASR Steam setting details

Node

detail: encoding: LINEAR_PCM …

encoding

Part of detail node. Specify Input data format Only Value LINEAR_PCM is supported

String

encoding: LINEAR_PCM

sample_rate_hertz

Part of detail node. Input audio sample rate Only Value 16000 is supported

Integer & >0

sample_rate_hertz: 16000

language_code

Part of detail node. Specify which language is used for recognition Only Value en-US is supported

String

language_code: en-US

max_alternatives

Part of detail node. Max alternatives selected by top confidence Only 1 is supported at present

Integer & >0

max_alternatives: 1

enable_automatic_punctuation

Part of detail node. Enable automatic punctuation or not

Boolean

enable_automatic_punctuation: false

ds_riva_asr_plugin Configuration properties for DS-Riva ASR library settings

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: ds_riva_asr_plugin

detail

Node DS-Riva ASR library details

Node

detail: final_only: false

final_only

Part of detail node. Specify whether final transcriptions only or with partial transcription output together

Boolean

final_only: false

enable_text_pts

Part of detail node. Specify whether text buffer timestamp is enabled or not.

Boolean

enable_text_pts: false

use_riva_pts

Part of detail node. Specify whether time informatation provided by Riva service is used to calculate the timestamp and duration of output buffer. Note: At present this option is supported for non-live sources only

Boolean

use_riva_pts: false

force_final_trailing

Part of detail node. Enable insertion of new line character after the final transcription

Boolean

force_final_trailing: false

Riva ASR model data generation and gRPC installation

Follow the NVIDIA Riva user guide to generate ASR related models offline. You only need to generate this once. When you get the access permission, follow instructions below:

  1. Make sure Riva ASR model repository is already generated. If not generated, follow the steps below:

  1. Check that all prerequisites are met. See https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#prerequisites

  2. Follow the QuickStart instructions for local deployment. Refer to the https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts

  1. Download riva_quickstart scripts:

    $ ngc registry resource download-version nvidia/riva/riva_quickstart:x.x.x-tag
    
  2. Use Riva Speech Skills 1.5.0-beta release onwards.

  3. For Riva Speech Skills 1.5.0-beta release use $ngc registry resource download-version nvidia/riva/riva_quickstart:1.5.0-beta:

    $cd riva_quickstart_v1.5.0-beta
    
  4. Make the following changes to the config.sh file, to disable other Riva services :

    service_enabled_asr=true
    service_enabled_nlp=false
    service_enabled_tts=false
    
    riva_model_loc="riva-asr-model-repo"
    
     models_asr=(
       "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset1p7_streaming:${riva_ngc_model_version}"
       "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"
      )
    
  5. Run $ bash riva_init.sh to generate docker volume riva-asr-model-repo.

  1. Additional Steps to download and deploy TAO ASR models from NGC:

    1. Refer to the README of deepstream-avysnc-app, available at path: /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-avsync

    2. Download and deploy Jasper models, follow section 2.b of avsync app README.

    3. gRPC installation and prerequisites:

      1. Install gRPC, follow section 2.c of avsync app README.

      2. Run ASR service and set LD_LIBRARY_PATH, refer sections 2.d and 2.e of avysnc app README.

Note

In case docker volume riva-asr-model-repo is corrupted, user need run docker volume rm riva-asr-model-repo before generate again.

  1. Verify docker volume riva-asr-model-repo available. Use $ docker volume inspect riva-asr-model-repo to inspect volume.

  2. Run Riva ASR service using riva_start.sh.

This plugin which uses gRPC APIs can be used on x86 or inside DeepStream docker also. gRPC C++ libraries are already installed on the DeepStream docker images

To run DeepStream docker:

$export DISPLAY=:0

$xhost + $ sudo docker run --rm -it --gpus '"'device=0'"' -v riva-asr-model-repo:/data -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --net=host $DS_Docker DS_Docker is the docker image name for DeepStream build.

Set LD_LIBRARY_PATH using $source ~/.profile before executing an application which uses Riva ASR services.

Note

The libnvds_riva_asr_grpc.so library works with Riva Speech Skills 1.5.0 Beta release or later.

For more information about Gst-nvdsasr sample tests, please see source code under directory sources/apps/audio_apps/deepstream_asr_app. Follow README to run tests.