Gst-nvdsasr ============ The ``Gst-nvdsasr`` plugin performs automatic speech recognition (ASR) on input audio data. Currently it is supported on x86 platform only. It uses optimized Riva models for ASR and punctuation-capitalization. With this plugin, the Riva ASR service can be accessed via Riva `gRPC` APIs by selecting corresponding low level library using the plugin properties. The plugin provides a mechanism to load custom ASR low level library at runtime. A custom library ``libnvds_riva_asr_grpc`` is implemented which uses `gRPC` APIs, to access Riva ASR service. .. note:: DS-Riva ASR ``libnvds_riva_asr_grpc`` library uses `gRPC` APIs to access the Riva ASR service. The Riva ASR service should be started before using this library. And the `gRPC` C++ installation is required on the client side. Required steps are mentioned in section "Riva ASR model data generation and gRPC installation" below. The plugin accepts raw PCM audio ``Gst`` Buffers from upstream component. It transforms audio into generic text ``Gst`` Buffer output. Model needs raw audio data input with S16LE (Signed 16bit Little Endian). Library settings can be configured via YAML format file (by setting a property on ``Gst-nvdsasr`` plugin) which has multi-part settings for plugin. As shown in the diagram below input S16LE raw audio data is preprocessed and inferred by the Riva ASR service . The final output is available in UTF8 text. .. image:: /content/DS_plugin_gst-nvdsasr.png :align: center :alt: Gst-Nvdsasr Inputs and Outputs ------------------- This section summarizes the inputs, outputs, and communication facilities of the ``Gst-nvdsasr`` plugin with ASR library (gRPC based). * Input * Raw Audio ``Gst`` Buffers * Control parameters * ``customlib-name``: Set a custom ASR library that the plugin loads to perform inference. Use : ``libnvds_riva_asr_grpc.so`` * ``create-speech-ctx-func``: Symbol name to create ASR speech context. Use : ``create_riva_asr_grpc_ctx`` * ``config-file``: A text file to configure the plugin. Use ``riva_asr_grpc_conf.yml`` * Outputs * ``Gst`` Text Buffer containing ASR output Features --------- The following table summarizes the features of the plugin. .. csv-table:: Gst-nvdsasr plugin features :file: ../text/tables/Gst-nvdsasr tables/DS_Plugin_gst-nvdsasr_features.csv :widths: 30, 30, 30 :header-rows: 1 DS-Riva ASR Yaml File Configuration Specifications ----------------------------------------------------- DS-Riva ASR configuration file uses YAML 1.2 file format: https://yaml.org/spec/1.2/spec.html. * There are multiple parts in the config file. An example for the gRPC ``riva_asr_grpc_conf.yml`` yml file is located at ``/opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_tts_app/``. Each part has a ``name`` indicating a unique part name and a ``detail`` indicating the setting details. * ``name: riva_server`` part configures Riva ASR server settings in its corresponding node ``detail:``. * ``name: riva_model`` part configures Riva ASR model entry in its corresponding node ``detail:``. * ``name: riva_asr_stream`` part configures Riva low level library supported features in its corresponding node ``detail:``. Each ASR plugin instance will launch a standalone Riva stream. The settings between different plugin instances could be different. * ``name: ds_riva_asr_plugin`` part configures DS-Riva ASR settings in its corresponding node ``detail:``. * A separator line with ``---`` is inserted between the 2 neighbor parts according to YAML specification. Gst Properties ---------------- The following tables describes the ``Gst`` properties of the ``Gst-nvdsasr`` plugin. .. csv-table:: riva_server Configuration properties for Riva low level library :file: ../text/tables/Gst-nvdsasr tables/DS_Plugin_gst-nvdsasr_gst_riva_server_properties.csv :widths: 20, 20, 20, 20 :header-rows: 1 .. csv-table:: riva_model Configuration properties for Riva low level library :file: ../text/tables/Gst-nvdsasr tables/DS_Plugin_gst-nvdsasr_gst_riva_model_properties.csv :widths: 20, 20, 20, 20 :header-rows: 1 .. csv-table:: ds_riva_asr_stream Configuration properties for Riva low level library :file: ../text/tables/Gst-nvdsasr tables/DS_Plugin_gst-nvdsasr_gst_riva_asr_stream_properties.csv :widths: 20, 20, 20, 20 :header-rows: 1 .. csv-table:: ds_riva_asr_plugin Configuration properties for DS-Riva ASR library settings :file: ../text/tables/Gst-nvdsasr tables/DS_Plugin_gst-nvdsasr_gst_riva_asr_plugin_properties.csv :widths: 20, 20, 20, 20 :header-rows: 1 Riva ASR model data generation and `gRPC` installation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Follow the NVIDIA Riva user guide to generate ASR related models offline. You only need to generate this once. When you get the access permission, follow instructions below: 1. Make sure Riva ASR model repository is already generated. If not generated, follow the steps below: a. Check that all prerequisites are met. See https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#prerequisites b. Follow the QuickStart instructions for local deployment. Refer to the https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts 1. Download riva_quickstart scripts: :: $ ngc registry resource download-version nvidia/riva/riva_quickstart:x.x.x-tag 2. Use Riva Speech Skills 1.5.0-beta release onwards. 3. For Riva Speech Skills 1.5.0-beta release use ``$ngc registry resource download-version nvidia/riva/riva_quickstart:1.5.0-beta``: :: $cd riva_quickstart_v1.5.0-beta 4. Make the following changes to the `config.sh` file, to disable other Riva services : :: service_enabled_asr=true service_enabled_nlp=false service_enabled_tts=false riva_model_loc="riva-asr-model-repo" models_asr=( "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset1p7_streaming:${riva_ngc_model_version}" "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}" ) 5. Run ``$ bash riva_init.sh`` to generate docker volume ``riva-asr-model-repo``. c. Additional Steps to download and deploy TAO ASR models from NGC: 1. Refer to the README of deepstream-avysnc-app, available at path: ``/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-avsync`` 2. Download and deploy Jasper models, follow section 2.b of avsync app README. 3. `gRPC` installation and prerequisites: a. Install `gRPC`, follow section 2.c of avsync app README. b. Run ASR service and set ``LD_LIBRARY_PATH``, refer sections 2.d and 2.e of avysnc app README. .. note:: In case docker volume ``riva-asr-model-repo`` is corrupted, user need run ``docker volume rm riva-asr-model-repo`` before generate again. 2. Verify docker volume ``riva-asr-model-repo`` available. Use ``$ docker volume inspect riva-asr-model-repo`` to inspect volume. 3. Run Riva ASR service using ``riva_start.sh``. This plugin which uses gRPC APIs can be used on x86 or inside DeepStream docker also. `gRPC` C++ libraries are already installed on the DeepStream docker images To run DeepStream docker: :: $export DISPLAY=:0 ``$xhost +`` ``$ sudo docker run --rm -it --gpus '"'device=0'"' -v riva-asr-model-repo:/data -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --net=host $DS_Docker`` ``DS_Docker`` is the docker image name for DeepStream build. Set ``LD_LIBRARY_PATH`` using ``$source ~/.profile`` before executing an application which uses Riva ASR services. .. note:: The ``libnvds_riva_asr_grpc.so`` library works with Riva Speech Skills 1.5.0 Beta release or later. For more information about ``Gst-nvdsasr`` sample tests, please see source code under directory ``sources/apps/audio_apps/deepstream_asr_app.`` Follow ``README`` to run tests.