Virtual Assistant¶

This Virtual Assistant sample application demonstrates how to use Riva AI Services, specifically ASR, NLP, and TTS, to build a simple but complete conversational AI application. It demonstrates receiving input via speech from the user, interpreting the query via an intention recognition and slot filling approach, computing a response, and speaking this back to the user in a natural voice.

This sample implements a dialog system with a state machine-based dialog state management approach, and using the intent/slot paradigm for interpreting user queries. The provided model demonstrates conversational queries of weather, temperature, and rainfall by geography and time, using a free web service as the fulfillment engine to return real weather data. While narrow in scope, it includes all of the components that make up more sophisticated and complete dialog systems such as those deployed on phones or in-home virtual assistants.

This sample could be modified to implement more models and more complex dialog state management. It is also intended to demonstrate how Riva can be integrated into the existing virtual assistant and dialog systems to provide state-of-the-art conversational intelligence optimized for NVIDIA’s accelerated computing platform.

The dialog manager is an environment that executes a state machine diagram created using .yaml configuration files. It is integrated with Riva NLP, ASR, and TTS modules.

It is possible, through the design of different state diagrams to create different types of assistants. As a sample, we provide an implementation of a weather bot.

Video Demo¶

Here is a video that shows the weather bot in action and then discusses the high-level description of the architecture followed by a very brief code walkthrough.

Running the Demo¶

Setting up Riva services is a prerequisite as the various components of the application depends on the availability of those servies. The weather bot assumes the availablity of the following models at the Riva endpoint – ASR, TTS, NLP – domain, context, weather, POI, and NER. After you have the Riva services up and running, proceed with running this application.

Download the samples image from NGC.

docker pull nvcr.io/nvidia/riva/riva-speech-client:1.9.0-beta-samples

Run the service within a Docker container.

docker run  -it --rm -p 8009:8009 nvcr.io/nvidia/riva/riva-speech-client:1.9.0-beta-samples /bin/bash
cd samples/virtual-assistant

Edit config.py with the right Riva IP, hosting port, and your weatherstack API access key (from https://weatherstack.com/). Then, start the server.
```
python3 main.py
```
Open the browser to https://127.0.0.1:8009/rivaWeather/.

Sample Use Cases¶

It is possible to ask the bot the following types of questions:

What is the weather in Berlin?
What is the weather?
- For which location?
What’s the weather like in San Francisco tomorrow?
- What about in Los Angeles, California?
What is the temperature in Milan on Friday?
Is it currently cold in San Francisco?
Is it going to rain in Mountain View tomorrow?
How much rain in Seattle?
Will it be sunny next week in Santa Clara?
Is cloudy today?
Is it going to snow tomorrow in Detroit?
How much snow is there in Tahoe currently?
How humid is it right now?
What is the humidity in Tahoe?

Limitations¶

The provided samples are not complete chatbots, but are intended as simple examples of how to build basic task-oriented chatbots with Riva. Consequently, the intent classifier and slot filling models have been trained with small amounts of data and are not expected to be highly accurate.
The Riva NLP sample supports intents for weather, temperature, rain, humidity, sunny, cloudy and snowfall checks. It does not support general conversational queries or other domains.
Both the Riva NLP and Rasa NLU samples support only 1 slot for city. Neither takes into account the day associated with the query.
These samples support up to four concurrent users. This restriction is not because of Riva, but because of the web framework (Flask and Flask-ScoketIO) that is being used. The socket connection is to stream audio to (TTS) and from (ASR); you are unable to sustain more than four concurrent socket connections.
The chatbot application is not optimized for low latency in the case of multiple concurrent users.
Some erratic issues have been observed with the chatbot samples on the Firefox browser. The most common issue is the TTS output being taken in as input by ASR for certain microphone gain values.

License¶

For applicable licenses, refer to the License section.