How do I perform Language Translation using Riva NMT APIs with out-of-the-box models?#

This tutorial walks you through the basics of using Riva Neural Machine Translation (NMT) services, specifically covering how to use Riva NMT APIs with out-of-the-box models. We will also cover how to use Riva’s Speech-to-Speech (S2S) and Speech-to-Text (S2T) APIs.

NVIDIA Riva Overview#

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance.
Riva offers a rich set of speech and natural language understanding services such as:

Automated speech recognition (ASR)
Text-to-Speech synthesis (TTS)
Neural Machine Translation (NMT)
A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, intent classification.

In this tutorial, we will interact with the Neural Machine Translation (NMT) APIs. We will also cover using Riva’s Speech-to-Speech (S2S) and Speech-to-Text (S2T) services.

For more information about Riva, refer to the Riva developer documentation.

Introduction to Language Translation with Riva NMT#

Riva Neural machine translation (NMT) is a framework for machine translation based on neural networks. NMT translates text between language pairs, that is, from one language to another. For example, we want a machine to translate text in one language (we call this the source language), to corresponding text in another language (we call this the target language).

Riva NMT EA offers multiple models for machine translation. These models fall into three model architectures:

Megatron models are based on Megatron-BERT architecture, with 500M parameters, capable of translating from any language to English and vice-versa. For example, the megatronnmt_en_any_500m model can be used to translate from English to any language.
Multilingual models support translating from one source language to multiple target languages and vice-versa. For example, the mnmt_en_deesfr_transformer24x6 model can be used to translate from English to German, Spanish, and French. Multilingual models have several language codes in their name. Use a multilingual model if you need to support multiple languages or if you want to optimize resource utilization since you can translate along multiple language pairs without loading multiple models. Running multilingual models prevents loading multiple models, which helps to prevent overhead. By default, use 24x6 multilingual models. You can use 12x2 instead of a 24x6 multilingual model if you need to reduce the resource consumption even further and can accept a bit of translation quality degradation.
Bilingual models are used for translation from one source language to another target language. For example, the en_de_24x6 model can be used to translate from English to Russian. Bilingual models have a single pair of language codes in their name. Use a bilingual model when you want the best possible performance for a specific language pair direction. Running bilingual models produces faster results compared to running multilingual models.

To learn more about Riva NMT, refer to the Riva NMT EA documentation.
For more information about the NMT model architecture and training, refer to the NeMo NMT documentation.

Language Pairs Supported:#

The following table lists models for all the language pairs supported by NVIDIA Riva Speech Skills NMT service.
The table also provides the language codes, the model name in Riva Quick Start Guide’s config.sh file, and the corresponding model name specified during API call.

Language Pair	Model name in `config.sh`	Model name specified during API call
English (`en`) to Simplified Chinese (`zh`)	`rmir_en_zh_24x6`	`en_zh_24x6`
Simplified Chinese (`zh`) to English (`en`)	`rmir_zh_en_24x6`	`zh_en_24x6`
English (`en`) to Russian (`ru`)	`rmir_en_ru_24x6`	`en_ru_24x6`
Russian (`ru`) to English (`en`)	`rmir_ru_en_24x6`	`ru_en_24x6`
English (`en`) to German (`de`)	`rmir_en_de_24x6`	`en_de_24x6`
German (`de`) to English (`en`)	`rmir_de_en_24x6`	`de_en_24x6`
English (`en`) to Spanish (`es`)	`rmir_en_es_24x6`	`en_es_24x6`
Spanish (`es`) to English (`en`)	`rmir_es_en_24x6`	`es_en_24x6`
English (`en`) to French (`fr`)	`rmir_en_fr_24x6`	`en_fr_24x6`
French (`fr`) to English (`en`)	`rmir_fr_en_24x6`	`fr_en_24x6`
*Any to English (`en`)	`rmir_megatronnmt_any_en_500m`	`megatronnmt_any_en_500m`
*English (`en`) to Any	`rmir_megatronnmt_en_any_500m`	`megatronnmt_en_any_500m`

* In Megatron models, “any” refers to the following 32 languages: cs, da, de, el, es, fi, fr, hu, it, lt, lv, nl, no, pl, pt, ro, ru, sk, sv, zh, ja, hi, ko, et, sl, bg, uk, hr, ar, vi, tr, id

Requirements and setup#

Start the Riva Speech Skills server.
To use the Riva NMT models, we first need to deploy them on the Riva Speech Skills server. Follow the instructions in the Riva Quick Start Guide to deploy the OOTB NMT models on the Riva Speech Skills server before running this tutorial. For this tutorial, deploy the following models:
- English (en) to French (fr) Bilingual model - The model name corresponding to this language pairs in the Riva Quick Start Guide’s config.sh can be found in the table above.
- English (en) to Any Megatron model - The model name corresponding to this is megatronnmt_en_any_500m. Uncomment the line which contains this model in config.sh.
- Spanish (es) ASR, Spanish-to-English (es-en) NMT and English (en) TTS models - The instructions to deploy Spanish (language code es-US) ASR model and English (en-US) TTS model can be found in the config.sh itself, as the latter section of this tutorial will cover using Speech-to-Speech (S2S) and Speech-to-Text (S2T) services. The model name corresponding to Spanish-English language pair can be found in the table above.
Install the Riva Client library.
Follow the steps here to install the Riva Client library.
Install additional libraries needed to run this tutorial.

!apt-get install python3-dev

''' 
Install Pyaudio. portaudio19-dev is a prerequisite for Pyaudio.
'''
!apt-get update && apt-get install -y python3-pyaudio portaudio19-dev
!python -m pip install pyaudio
# If you run into errors running apt-get commands through Jupyter notebook, run this command directly on your local machine's terminal. You might need sudo access to run this command.
# For alternate options to install PyAudio, please refer to PyAudio documentation - https://people.csail.mit.edu/hubert/pyaudio/

'''
Install librosa.
'''
!apt-get update && apt-get install -y libsndfile1
# If you run into errors running apt-get commands through Jupyter notebook, run this command directly on your local machine's terminal. You might need sudo access to run this command.
!python -m pip install librosa

'''
Install nltk
'''
!python -m pip install nltk

Language Translation with Riva NMT APIs#

Now, let’s generate language translations using Riva APIs, with an OOTB models.

Import the Riva client libraries#

import riva.client

Create a Riva client and connect to the Riva Speech API server#

The following URI assumes a local deployment of the Riva Speech API server is on the default port. In case the server deployment is on a different host or via a Helm chart on Kubernetes, use an appropriate URI.

# `Auth` class wraps a gRPC channel.
auth = riva.client.Auth(uri='localhost:50051')

# `NeuralMachineTranslationClient` is for sending requests to a server.
riva_nmt_client = riva.client.NeuralMachineTranslationClient(auth)

Make a gRPC request to the Riva Speech API server#

Inference with Bilingual NMT model:#

Now, let’s make a gRPC request to the Riva Speech server’s Bilingual NMT model rmir_en_fr_24x6 for translation from source_language, English (en) to target_language, French (fr).

eng_text = (
    "Molecular Biology is the field of biology that studies the composition, structure "
    "and interactions of cellular molecules – such as nucleic acids and proteins – that "
    "carry out the biological processes essential for the cell's functions and maintenance."
)
model_name = 'en_fr_24x6'
source_language = 'en'
target_language = 'fr'

To learn more about NeuralMachineTranslationClient, refer to the corresponding docstring.

Now we submit the request to the server.

response = riva_nmt_client.translate([eng_text], model_name, source_language, target_language)
# response.translations is a list of all translations - Each entry corresponds to the 
# corresponding entry in the texts attribute of TranslateTextRequest (nmt_request.texts) from above.

print("English Text: ", eng_text)
# Fetch the translated text from the 1st entry of response.translations
print("\nTranslated French Text: ", response.translations[0].text)

Let us make a gRPC request to the Multilingual Megatron model megatronnmt_en_any_500m to get the translation for the same English text in French.

model_name = 'megatronnmt_en_any_500m'

response = riva_nmt_client.translate([eng_text], model_name, source_language, target_language)
# response.translations is a list of all translations - Each entry corresponds to the 
# corresponding entry in the texts attribute of TranslateTextRequest (nmt_request.texts) from above.

print("English Text: ", eng_text)
# Fetch the translated text from the 1st entry of response.translations
print("\nTranslated French Text: ", response.translations[0].text)

Riva NMT APIs - Handling large input text:#

Riva NMT API has a maximum input token limit of 512 tokens. If an input larger than 512 tokens is provided, the NMT API doesn’t return the complete transcription:

eng_text = """
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
"""
model_name = 'en_fr_24x6'
source_language = 'en'
target_language = 'fr'

response = riva_nmt_client.translate([eng_text], model_name, source_language, target_language)
print("English Text: ", eng_text)
print("Translated French Text: ", response.translations[0].text)

As can be seen above, the translated French text cuts off after 512 tokens of the input English text.

The best way to handle such large input texts is to split the input text and send these as list of text to the NMT API:
Unfortunately there is currently no precise way to find the number of tokens in input text. From WMT test sets, on average across multiple sentences and models, 1 token maps to 1.2 characters (This is just provided as an estimate and the mapping can vary significantly based on the model/sentence pair). Based on this estimate, we should keep size of input text per API call to less than 615 characters (512 * 1.2). We also need to ensure that we respect sentence boundaries while splitting text.
Let us look at example on how to handle large text translation.

import nltk
nltk.download('punkt')

from nltk import tokenize

def nmt_large_text_split(input_text, max_chars = 615):
    """Function to split large input text"""
    
    def nmt_text_split_sentence_splitter(sentence_text, max_chars):
        """Function to split a sentence while respecting word boundaries, if sentence length > max_chars"""
        sentence_splits = []
        if len(sentence_text) > max_chars:
            words = sentence_text.split()
            for word in words:
                if len(sentence_splits) > 0 and (len(sentence_splits[-1]) + len(word) <= max_chars):
                    sentence_splits[-1] += word
                else:
                    sentence_splits.append(word)
        else:
            sentence_splits.append(sentence_text)
        return sentence_splits
    
    # 1. Split the input text into sentences 
    sentences = tokenize.sent_tokenize(input_text) # nltk.tokenize is the best way to split large text into sentences.
    
    # 2. Add input text to nmt_input_texts, ensuring no entry is greater than max_chars
    nmt_input_texts = []
    for i in range(len(sentences)):
        # 2.1. Split sentence if sentence length > max_chars, and update sentences 
        sentence_splits = nmt_text_split_sentence_splitter(sentences[i], max_chars)
        sentences = sentences[:i] + sentence_splits + sentences[i+1:]
        # 2.2. Adding entry to nmt_input_texts        
        if len(nmt_input_texts) > 0 and (len(nmt_input_texts[-1]) + len(sentences[i]) <= max_chars):
            nmt_input_texts[-1] += sentences[i]
        else:
            nmt_input_texts.append(sentences[i])    
    return nmt_input_texts
    

eng_text = """
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
"""
model_name = 'en_fr_24x6'
source_language = 'en'
target_language = 'fr'

parts = nmt_large_text_split(eng_text)

response = riva_nmt_client.translate(parts, model_name, source_language, target_language)

print("English Text:\n", eng_text)
print("Translated French Text:\n")
for i, translation in enumerate(response.translations):
    print(translation.text)

WARNING: Please take into account that you cannot pass more than 8 texts to the model. If you pass more than 8 inputs, then the response will be empty.

Riva S2T and S2S APIs#

Riva Speech-to-Text Translation (S2T) service transcribes audio to text between given language pairs, that is, from a source language to a target language. S2T takes an audio stream or audio buffer as input and returns a transcription. The Riva S2T service is composed of Riva ASR and NMT pipelines internally and supports streaming mode.

Riva Speech-to-Speech Translation (S2S) service translates audio between language pairs, that is, from one source language to another target language. S2S takes an audio stream or audio buffer as input and returns a generated audio file. The Riva S2S service is composed of Riva ASR, NMT, and TTS pipelines internally. The Riva S2S service supports streaming mode.

Riva ASR provides state-of-the-art OOTB (out-of-the-box) models and pipelines for multiple languages, like English, Spanish, German, Russian and Mandarin. Riva also supports easy customization of the ASR pipeline, in various ways, to meet your specific needs.
Riva TTS provides two state-of-the-art voices (one male and one female) for English. Riva also supports easy customization of TTS in various ways, to meet your specific needs.

In this section, let us look at examples showing how to generate translated speech and text from audio. Make sure that your Riva speech server is deployed with Spanish ASR, Spanish to English NMT, and English TTS model.

Riva Speech-to-Text Service#

The Riva S2T service supports models for the following language pairs:

Spanish (es) to English (en)
German (de), Spanish (es), French (fr) to English (en)
Simplified Chinese (zh) to English (en)
Russian (ru) to English (en)
German (de) to English (en)
French (fr) to English (en)

Let’s take an example of translating Spanish speech to English text.

Import the Riva client libraries#

Let’s import some of the required libraries, including the Riva Client libraries.

import IPython.display as ipd
import numpy as np

# Riva ASR client import
import riva.client

Create a Riva client and connect to the Riva Speech API server#

auth = riva.client.Auth(uri="localhost:50051")

# `NeuralMachineTranslationClient` is for sending requests to a server.
riva_nmt_client = riva.client.NeuralMachineTranslationClient(auth)

Load Audio File#

Let’s load up an audio file and create an audio chunk generator to simulate streaming input.

my_wav_file = "./audio_samples/es-US_sample.wav"
output_device = None  # use default device

wav_parameters = riva.client.get_wav_file_parameters(my_wav_file)
audio_chunk_iterator = riva.client.AudioChunkFileIterator(
    my_wav_file, chunk_n_frames=4800)

Define S2T Config#

The S2T config is composed using a sequence of Riva ASR and NMT configs.

s2t_config = riva.client.StreamingTranslateSpeechToTextConfig(
        asr_config = riva.client.StreamingRecognitionConfig(
            config=riva.client.RecognitionConfig(
                encoding=riva.client.AudioEncoding.LINEAR_PCM,
                language_code='es-US',    # Spanish ASR model
                max_alternatives=1,
                profanity_filter=False,
                enable_automatic_punctuation=False,
                verbatim_transcripts=not True,
                sample_rate_hertz=16000,
                audio_channel_count=1,
            ),
            interim_results=True,
        ),
        translation_config = riva.client.TranslationConfig(
            source_language_code="es-US",    # Transcript's language is Spanish
            target_language_code='en-US',    # Target language is English
        ),
    )

Make Riva ASR gRPC requests to the Riva Speech API server:#

# Create a response object which takes the S2T config and audio iterator as arguments

responses = riva_nmt_client.streaming_s2t_response_generator(
            audio_chunks=audio_chunk_iterator,
            streaming_config=s2t_config)

Print streaming response#

# Generate chunks of audio and submit it to the S2T service to receive translations

riva.client.print_streaming(responses, additional_info='time')

As seen above, we generate a stream of translations, corresponding to the stream of transcripts generated by Riva ASR.

Riva Speech-to-Speech service#

The Riva S2S service supports models for the following language pairs:

Spanish (es) to English (en)
German (de), Spanish (es), French (fr) to English (en)
Simplified Chinese (zh) to English (en)
Russian (ru) to English (en)
German (de) to English (en)
French (fr) to English (en)

Let’s take an example of translating Spanish speech to English speech.

Create a Riva client and connect to the Riva Speech API server#

auth = riva.client.Auth(uri="localhost:50051")

# `NeuralMachineTranslationClient` is for sending requests to a server.
riva_nmt_client = riva.client.NeuralMachineTranslationClient(auth)

Load Audio File#

Let’s load up an audio file and create an audio chunk generator to simulate streaming input.

my_wav_file = "./audio_samples/es-US_sample.wav"
output_device = None  # use default device

wav_parameters = riva.client.get_wav_file_parameters(my_wav_file)
audio_chunk_iterator = riva.client.AudioChunkFileIterator(
    my_wav_file, chunk_n_frames=4800)

Define S2S Config#

The S2S config is composed using a sequence of Riva ASR, NMT and TTS configs.

s2s_config = riva.client.StreamingTranslateSpeechToSpeechConfig(
            asr_config = riva.client.StreamingRecognitionConfig(
            config=riva.client.RecognitionConfig(
                encoding=riva.client.AudioEncoding.LINEAR_PCM,
                language_code='es-US',    # Spanish ASR model
                max_alternatives=1,
                profanity_filter=False,
                enable_automatic_punctuation=False,
                verbatim_transcripts=not True,
                sample_rate_hertz=16000,
                audio_channel_count=1,
            ),
            interim_results=True,
        ),
        translation_config = riva.client.TranslationConfig(
            source_language_code="es-US",    # Transcript's language is Spanish
            target_language_code='en-US',    # Target language is English
        ),
        tts_config = riva.client.SynthesizeSpeechConfig(
            encoding=1,
            sample_rate_hz=44100,
            voice_name="English-US.Female-1",    # English Female voice
            language_code="en-US",
        ),
    )

Make Riva ASR gRPC requests to the Riva Speech API server:#

# Create a response object which takes the S2S config and audio iterator as arguments

responses = riva_nmt_client.streaming_s2s_response_generator(
            audio_chunks=audio_chunk_iterator,
            streaming_config=s2s_config)

Listen streaming response#

# Create an empty array to store the receiving audio buffer

empty = np.array([])

# Send requests and listen to streaming response from the S2S service
for i, rep in enumerate(responses):
    audio_samples = np.frombuffer(rep.speech.audio, dtype=np.int16) / (2**15)
    print("Chunk: ",i)
    ipd.display(ipd.Audio(audio_samples, rate=44100))
    empty = np.concatenate((empty, audio_samples))

# Full translated synthesized speech
print("Final synthesis:")
ipd.display(ipd.Audio(empty, rate=44100))

As seen above, we synthesize stream of speech, corresponding to the stream of transcripts generated by Riva ASR. This speech might be intermediate or complete sentence translations, depending on the sentence boundaries in the audio.

NVIDIA Riva

How do I perform Language Translation using Riva NMT APIs with out-of-the-box models?

Contents

How do I perform Language Translation using Riva NMT APIs with out-of-the-box models?#

NVIDIA Riva Overview#

Introduction to Language Translation with Riva NMT#

Language Pairs Supported:#

Requirements and setup#

Language Translation with Riva NMT APIs#

Import the Riva client libraries#

Create a Riva client and connect to the Riva Speech API server#

Make a gRPC request to the Riva Speech API server#

Inference with Bilingual NMT model:#

Riva NMT APIs - Handling large input text:#

Riva S2T and S2S APIs#

Riva Speech-to-Text Service#

Import the Riva client libraries#

Create a Riva client and connect to the Riva Speech API server#

Load Audio File#

Define S2T Config#

Make Riva ASR gRPC requests to the Riva Speech API server:#

Print streaming response#

Riva Speech-to-Speech service#

Create a Riva client and connect to the Riva Speech API server#

Load Audio File#

Define S2S Config#

Make Riva ASR gRPC requests to the Riva Speech API server:#

Listen streaming response#