Virtual Assistant

This Virtual Assistant sample application demonstrates how to use Jarvis AI Services, specifically ASR, NLP, and TTS, to build a simple but complete conversational AI application. It demonstrates receiving input via speech from the user, interpreting the query via an intention recognition and slot filling approach, computing a response, and speaking this back to the user in a natural voice.

Jarvis Virtual Assistant Framework

This sample implements a dialog system with a state machine-based dialog state management approach, and using the intent/slot paradigm for interpreting user queries. The provided model demonstrates conversational queries of weather, temperature, and rainfall by geography and time, using a free web service as the fulfillment engine to return real weather data. While narrow in scope, it includes all of the components that make up more sophisticated and complete dialog systems such as those deployed on phones or in-home virtual assistants.

This sample could be modified to implement more models and more complex dialog state management. It is also intended to demonstrate how Jarvis can be integrated into existing virtual assistant and dialog systems to provide state-of-the-art conversational intelligence optimized for NVIDIA’s accelerated computing platform.

The Dialog Manager is basically an environment that executes a state machine diagram created using yaml configuration files. It is integrated with Jarvis NLP, ASR, TTS modules.

It is possible, through the design of different state diagrams to create different types of assistants. As a sample, we provide an implementation of a weather bot.

Virtual Assistant Chat screenshot

Video Demo

Here is a video that shows the weatherbot in action and then discusses the high-level description of the architecture followed by a very brief code walkthrough.

Running the Demo

Setting up Jarvis services is a prerequisite as the various components of the application depends on the availability of those servies. The weatherbot assumes the availablity of the following models at the Jarvis endpoint – ASR, TTS, NLP – domain, context, weather, poi and NER. After you have the Jarvis services up and running, proceed with running this application.

  1. Download the samples image from NGC.

    docker pull nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.2-samples
    
  2. Run the service within a Docker container.

    docker run  -it --rm -p 8009:8009 nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.2-samples /bin/bash
    cd samples/jarvis-weather
    
  3. Edit config.py with the right Jarvis IP, hosting port and your weatherstack API access key (from https://weatherstack.com/). Then, start the server.

    python3 main.py
    
  4. Open the browser to https://127.0.0.1:8009/jarvisWeather/.

Sample Use Cases

It is possible to ask the bot the following types of questions:

  • What is the weather in Berlin?

  • What is the weather?

    • For which location?

  • What’s the weather like in San Francisco tomorrow?

    • What about in Los Angeles, California?

  • What is the temperature in Milan on Friday?

  • Is it currently cold in San Francisco?

  • Is it going to rain in Mountain View tomorrow?

  • How much rain in Seattle?

  • Will it be sunny next week in Santa Clara?

  • Is cloudy today?

  • Is it going to snow tomorrow in Detroit?

  • How much snow is there in Tahoe currently?

  • How humid is it right now?

  • What is the humidity in Tahoe?

Limitations

  1. The provided samples are not complete chatbots, but are intended as simple examples of how to build basic task-oriented chatbots with Jarvis. Consequently, the intent classifier and slot filling models have been trained with small amounts of data and are not expected to be highly accurate.

  2. The Jarvis NLP sample supports intents for weather, temperature, rain, humidity, sunny, cloudy and snowfall checks. It does not support general conversational queries or other domains.

  3. Both the Jarvis NLP and Rasa NLU samples support only 1 slot for city. Neither takes into account the day associated with the query.

  4. These samples support up to four concurrent users. This restriction is not because of Jarvis, but because of the web framework (Flask and Flask-ScoketIO) that is being used. The socket connection to stream audio to (TTS) and from (ASR) the user is unable to sustain more than four concurrent socket connections.

  5. The chatbot application is not optimized for low latency in case of multiple concurrent users.

  6. Some erratic issues have been observed with the chatbot samples on the Firefox browser. The most common issue is the TTS output being taken in as input by ASR for certain microphone gain values.

License

End User License Agreement is included with the product. Licenses are also available along with the model application zip file. By pulling and using the Jarvis SDK container and download models, you accept the terms and conditions of these licenses.