Step #1: Hands On Lab
Within this lab, you will dive deeper into the weather VA sample, specifically the sections of the code pertaining to Riva Streaming ASR, and Riva TTS calls. There are a few lines of code missing where the actual Riva ASR/TTS service calls are made, and as an exercise you are required to fill-in those missing pieces to complete the application. The solution to the exercises has been provided towards the end of the guide.
The Riva lab will use two important links from the left-hand navigation pane throughout the course of the lab.
Please use Chrome or Firefox when trying the weather VA sample in this lab. The Web Audio APIs used to handle audio in the application works best with these browsers.
For this lab, the Riva server has been set up for you.
Open the VM Console by selecting VM Console on the left-hand navigation page.
Go to the “lab-2” directory which contains the code for virtual assistant. We will also activate the conda environment which contains all the dependencies required for this application.
cd riva-launchpad/lab-2 source lab2/bin/activate cd virtual-assistant
Once inside the directory, you will find the exercise code in the current working directory. Edit the configuration file
config.py
, and set the weatherstack API key. The VA uses weatherstack for weather fulfillment, that is when the weather intents are recognized, real-time weather information is fetched from weatherstack.Open the configuration file
vim config.py
Sign up to the free tier of weatherstack, and get your API access key. Copy your access key in config.py. The code snippet will look like the example below. Save and close the file once done.
riva_config = { ... "WEATHERSTACK_ACCESS_KEY": "<API_ACCESS_KEY>", # Get your access key at - https://weatherstack.com/ ... }
The VA transcribes user utterances using Riva’s streaming recognition API. The proto defining the services and messages pertaining to ASR is present here.
An example of using the Riva streaming ASR API can also be found in the Riva python clients repository.
Let’s go over some of the salient bits in using Riva’s Streaming ASR service.
The first input audio_chunks
to the streaming_response_generator
method accepts an iterable of audio chunks which yields the byte-sequences of audio content that would be sent to the Riva speech server .The second input streaming_config
to the streaming_response_generator
method contains the configuration that provides information on how to process the request. The subsequent messages sent in the stream must contain only raw bytes of the audio data to be recognized.
# Boilerplate
import riva.client
auth = riva.client.Auth(uri='localhost:50051') # Channel to Riva Server
riva_asr = riva.client.ASRService(auth)
# Configuration
config = riva.client.RecognitionConfig()
config.sample_rate_hertz = 16000
config.language_code = "en-US"
config.max_alternatives = 1
config.enable_automatic_punctuation = True
enable_word_time_offsets = True
config.verbatim_transcripts = False
# Provides information to the recognizer that specifies how to process the request
streaming_config = riva.client.StreamingRecognitionConfig(config=config, interim_results=True) # read data
For a given stream, sequential chunks of audio data are sent in sequential requests. We can leverage a Python generator function to compose the input for streaming_response_generator
, that yields the chunks of audio data.
"""Generates byte-sequences of audio chunks from the audio buffer"""
def build_request_generator(self):
while not self.closed:
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b''.join(data)
# Next, we call the Riva streaming_response_generator method to generate transcripts
responses = riva_asr.streaming_response_generator(self.build_request_generator(), streaming_config)
This returns a list of StreamingRecognitionResult
objects, which include the different alternatives (recognition hypotheses), as well as a is_final
boolean that indicates whether the result is interim or final.
The VA implements ASR support through its ASRPipe
class in virtual-assistant/riva_local/asr/asr.py
.
ASRPipe
contains methods to communicate with RIVA ASR service using the StreamingRecognize
API, as well as the interface to operate buffers for the audio input, and transcription output. In tune to the API above, the ASRPipe.main_asr()
method sends requests with a generator for the stream of inputs in the audio buffer, queries the RIVA ASR service, and calls another method ASRPipe.listen_print_loop()
to iterate over the output stream of responses.
Exercise 1 - Take a look at the ASRPipe.main_asr()
method, and fill-in the missing line of the code that calls the Riva ASR service.
After any dialog state transitions, the response text generated by the virtual assistant is synthesized into audio using Riva’s SynthesizeOnline API. The input SynthesizeSpeechRequest
contains the desired text and the configuration, and this API returns the stream of audio bytes in the requested format as it becomes available.
# Boilerplate
import riva.client
auth = riva.client.Auth(uri='localhost:50051') # Channel to Riva Server
riva_tts = riva.client.SpeechSynthesisService(auth)
# Query the Riva TTS service with request related arguments
responses = riva_tts.synthesize_online(
text = "Hello",
language_code = "en-US",
encoding = riva.client.AudioEncoding.LINEAR_PCM,
sample_rate_hz = 22050,
voice_name = "ljspeech"
)
# This returns an iterable (`response`) object that contains the stream of results.
for resp in responses:
audio_bytes = resp.audio
// .. parse audio
The VA implements Speech Synthesis through its TTSPipe
class in virtual-assistant/riva_local/tts/tts_stream.py
.``TTSPipe`` contains methods to communicate with RIVA TTS service using the SynthesizeOnline
API, as well as the interface to access buffers for the desired text, and the speech output.
The TTSPipe.get_speech()
method composes a request, queries the RIVA TTS service, and then loops over the response iterable to process and yield segments of audio bytes.
Exercise 2 - Take a look at the TTSPipe.get_speech()
method, and fill-in the missing line of the code that calls the Riva TTS service.
If you’re curious about the offline Speech Synthesis API, you can find that in the virtual-assistant/riva_local/tts/tts.py
file. The imports in virtual-assistant/riva_local/chatbot/chatbot.py
determine which mode gets used.
Once the exercises are completed, start the VA by running the command below.
python3 main.py
Open the Client Application by clicking the link on the left-hand navigation pane. This will start another tab in your browser with the VA application.
Below are a example questions to ask the VA.
What is the weather in San Francisco?
What is the temperature in Chicago on Friday?
How humid is it right now?
Additional sample questions can be found in Riva docs here.
The missing code-snippets for the exercises in the previous section can be found here. Click on the widgets below to reveal them:
Exercise 1
responses = self.riva_asr.streaming_response_generator(audio_chunks=self.request_generator, streaming_config=streaming_config)
Exercise 2
responses = self.riva_tts.synthesize_online(
text = text,
language_code = self.language_code,
encoding = riva.client.AudioEncoding.LINEAR_PCM,
sample_rate_hz = self.sample_rate,
voice_name = self.voice_name
)