Gst-nvds_text_to_speech (Alpha)

The Gst-nvds_text_to_speech plugin performs speech synthesis on the input text. Currently it supports only x86 platform. By default, the plugin loads DS-Riva Text To Speech library (libnvds_riva_tts.so) to perform speech synthesis.

The plugin provides a mechanism to load custom TTS low level library at runtime.

Note

  • The Gst-nvds_text_to_speech plugin is being released as an alpha feature.

  • DS-Riva Text To Speech library uses gRPC API to access the Riva TTS service. The Riva TTS service should be started before using this plugin. And the gRPC C++ installation is required on the client side.

The plugin accepts text (UTF8) Gst Buffers from upstream component. It transforms the text into audio Gst Buffer output.

The DS-Riva TTS library (libnvds_riva_tts.so) generates raw audio data with F32LE format (float 32 bit Little Endian) at 22050 Hz sample rate. Library settings can be configured via YAML format file (by setting a property on nvds_text_to_speech gst plugin) which has multi-part settings for plugin control, Riva TTS service configurations.

As shown in the diagram below, input text is send to Riva TTS service for speech synthesis. The final output is available as F32LE PCM audio at 22050 Hz.

Gst-Nvds_text_to_speech

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvds_text_to_speech plugin with DS-Riva TTS implementation.

  • Input

    • Text GStreamer buffers

  • Control parameters

    • customlib-name: Set a custom TTS library that the plugin loads to perform speech synthesis. By default, DS-Riva TTS library (libnvds_riva_tts.so) is set

    • create-speech-ctx-func: Symbol name to create TTS speech context. Default: create_text_to_speech_ctx

    • config-file: A text file to configure the plugin, DS-Riva TTS service requests.

  • Output

    • Raw audio GStreamer buffers containing the synthesized speech

Features

The following table summarizes the features of the plugin.

Gst-nvds_text_to_speech plugin features

Feature

Description

Release

TTS template

The plugin provides a Text To Speech base which can support runtime loading of custom TTS library

DS 6.0

DS-Riva TTS library and Context

Default TTS library based on Riva TTS gRPC service

DS 6.0

Live speech synthesis

Supports speech synthesis in real time using the streaming mode of the Riva TTS service

DS 6.0

Languages support

English is supported at present

DS 6.0

Audio format

Outputs F32LE Linear PCM mono audio at 22050 Hz

DS 6.0

Frame size

Supports configurable output frame size

DS 6.0

DS-Riva TTS Yaml File Configuration Specifications

DS-Riva TTS configuration file uses YAML 1.2 file format: https://yaml.org/spec/1.2/spec.html.

  • There are multiple parts in the configuration file. An example is located at /opt/nvidia/deepstream/deepstream/sources/apps/audio_apps/deepstream_asr_tts_app/riva_tts_conf.yml. Each part has a name indicating a unique part name and a detail indicating the setting details.

  • name: riva_server part configures the Riva server URI in its corresponding node detail:.

  • name: riva_tts_stream part configures Riva TTS service supported features in its corresponding node detail:.

  • name: ds_riva_tts_plugin part configures DS-Riva TTS settings in its corresponding node detail:.

  • A separator line with --- is inserted between the 2 neighbor parts according to YAML specification.

Gst Properties

The following tables describes the Gst properties of the Gst-nvds_text_to_speech plugin.

riva_server: Configuration properties for Riva low level library

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

name: riva_server

detail

Node for Riva Server Setting details

Node

detail: server_uri: “localhost:50051”

server_uri

Part of detail node. Specify address of the Riva TTS service

String

server_uri: “localhost:50051”

ds_riva_tts_stream: Configuration properties for Riva TTS service request

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: riva_tts_stream

detail

Node for Riva TTS Steam setting details

Node

detail: encoding: LINEAR_PCM

encoding

Part of detail node. Specify output audio encoding format. Only LINEAR_PCM is supported

String

encoding: LINEAR_PCM

language_code

Part of detail node. Specify which language is used for speech synthesis. Currently only en-US is supported

String

language_code: en-US

voice_name

Part of detail node. Specify the voice name parameter used for speech synthesis

String

voice_name: ljspeech

ds_riva_tts_plugin: Configuration properties for DS-Riva TTS library settings

Property

Meaning

Type and Range

Example Notes

name

Unique name

String

Must be name: ds_riva_tts_plugin

detail

Node DS-Riva TTS library details

Node

detail: output_mode: 0

output_mode

Part of detail node. Specify output mode. Output mode 0: Default. Outputs audio as received from Riva server. Suitable for non real-time sinks like filesink. Output mode 1: Inserts silence in output when audio from server is not available. Suitable for real-time/live sinks like autoaudiosink.

Integer: 0 or 1

output_mode: 1

framing_mode

Part of detail node. Specify framing mode. Framing mode 0: Default. Use output chunk size as received from Riva server. Framing mode 1: Splits the audio received from server into chunks of size specified by the frame_size. Last chunk if not padded if less than frame_size samples. Framing mode 1: Splits the audio into chunks of frame_size samples with last chunk padded to frame_size.

Integer: 0 1 2

framing_mode: 2

frame_size

Part of detail node. Specify output frame size in number of samples. Used with framing mode 1 or 2 or output mode 1.

Integer: 1 to 65535

frame_size: 2205

Riva TTS Service Initiation

Refer to https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts for the procedure to start Riva TTS service. The DS-Riva TTS library (libnvds_riva_tts.so) works with Riva Speech Skills 1.4.0-beta release or later.

gRPC C++ Installation

  1. To install the gRPC C++ shared libraries, follow the steps here: https://grpc.io/docs/languages/cpp/quickstart/#install-grpc.

Add -DBUILD_SHARED_LIBS=ON to the cmake build options. (Recommend to use 'make -j4' instead of 'make -j')

  1. Ensure that the LD_LIBRARY_PATH environment variable includes the path to the installed gRPC libraries.

Note

The gRPC C++ libraries are already installed on the DeepStream docker images and the corresponding commands to update the environment variables are added to the ~/.profile file.

  1. After starting a new docker terminal, run the below command to update the LD_LIBRARY_PATH environment variable with gRPC installation path:

    $ source ~/.profile
    

Sample Application

A sample application using the plugin is available here: sources/apps/audio_apps/deepstream_asr_tts_app. Follow the README to run the tests.