Translation Overview#

NVIDIA Riva translation is a framework based on neural networks. Riva translation translates text between language pairs, that is, from one language to another. For example, we want a machine to translate text in one language (we call this the source language), to corresponding text in another language (we call this the target language). The models are trained using NVIDIA NeMo; a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Neural Machine Translation (NMT) and Text-to-Speech (TTS) models.

For more information about the translation model architecture and training, refer to the NeMo Machine Translation documentation.

Language Pairs Supported#

The NVIDIA Riva translation service supports models for these language pairs:

English (en) to Simplified Chinese (zh-CN)
Simplified Chinese (zh-CN) to English (en)
English (en) to Traditional Chinese (zh-TW)
Traditional Chinese (zh-TW) to English (en)
English (en) to Russian (ru)
Russian (ru) to English (en)
English (en) to German (de)
German (de) to English (en)
English (en) to European Spanish (es-ES)
European Spanish (es-ES) to English (en)
English (en) to LATAM Spanish (es-US)
LATAM Spanish (es-US) to English (en)
English (en) to French (fr)
French (fr) to English (en)
English (en) to Danish (da)
Danish (da) to English (en)
English (en) to Greek (el)
Greek (el) to English (en)
English (en) to Finnish (fi)
Finnish (fi) to English (en)
English (en) to Hungarian (hu)
Hungarian (hu) to English (en)
English (en) to Italian (it)
Italian (it) to English (en)
English (en) to Lithuanian (lt)
Lithuanian (lt) to English (en)
English (en) to Latvian (lv)
Latvian (lv) to English (en)
English (en) to Dutch (nl)
Dutch (nl) to English (en)
English (en) to Norwegian (no)
Norwegian (no) to English (en)
English (en) to Polish (pl)
Polish (pl) to English (en)
English (en) to European Portuguese (pt-PT)
European Portuguese (pt-PT) to English (en)
English (en) to Brazilian Portuguese (pt-BR)
Brazilian Portuguese (pt-BR) to English (en)
English (en) to Romanian (ro)
Romanian (ro) to English (en)
English (en) to Slovak (sk)
Slovak (sk) to English (en)
English (en) to Swedish (sv)
Swedish (sv) to English (en)
English (en) to Japanese (ja)
Japanese (ja) to English (en)
English (en) to Hindi (hi)
Hindi (hi) to English (en)
English (en) to Korean (ko)
Korean (ko) to English (en)
English (en) to Estonian (et)
Estonian (et) to English (en)
English (en) to Slovenian (sl)
Slovenian (sl) to English (en)
English (en) to Bulgarian (bg)
Bulgarian (bg) to English (en)
English (en) to Ukrainian (uk)
Ukrainian (uk) to English (en)
English (en) to Croatian (hr)
Croatian (hr) to English (en)
English (en) to Arabic (ar)
Arabic (ar) to English (en)
English (en) to Vietnamese (vi)
Vietnamese (vi) to English (en)
English (en) to Turkish (tr)
Turkish (tr) to English (en)
English (en) to Indonesian (id)
Indonesian (id) to English (en)
English (en) to Czech (cs)
Czech (cs) to English
English (en) to Thai (th)
Thai (th) to English

Translation Features#

Riva translation currently provides an API to translate to and from language pairs using models trained in NeMo Machine Translation. Support is limited exclusively to the Megatron Multilingual model architecture.

Megatron Multilingual models enable translation between multiple languages:

en_any: Translates from English to any of the supported languages.
any_en: Translates from any of the supported languages to English.
any_any: Translates between any of the supported languages.

In addition to English, Megatron models support 36 other languages: Danish, German, Greek, European Spanish, LATAM Spanish Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Dutch, Norwegian, Polish, European Portuguese, Brazilian Portuguese, Romanian, Russian, Slovak, Swedish, Simplified Chinese, Traditional Chinese, Japanese, Hindi, Korean, Estonian, Slovenian, Bulgarian, Ukrainian, Croatian, Arabic, Vietnamese, Turkish, Indonesian, Czech, and Thai.

Riva translation enables you to batch multiple sentences together to provide a faster translation experience. Using the translation client, you can batch together up to 8 sentences to be translated in a single request. The batch size, which defaults to 8, can be adjusted using the batch_size parameter in the client.

Translation Exclusion#

The latest update enables you to perform translation exclusion. It can be done using the container feature or the model itself. For using with the model update enclose the words/phrases in “Copy” HTML tags (<cp></cp>). For container support, enclose words or phrases that should remain unchanged in “Do not translate” HTML tags (<dnt></dnt>), the text will be preserved as-is in its source language.

Sample input: “<dnt>NVIDIA Riva</dnt> <cp>Speech-to-Speech Translation (S2S)</cp> service translates audio between language pairs.”

Custom Translation Dictionary#

Riva NMT supports providing a text dictionary to get the desired translation for specific words outputted by the server. This custom dictionary must contain a word followed by the desired translation, both separated by the double-hash ## symbol. Different words and translation pairs can be provided on a new line in the input dictionary file. The input dictionary file can be passed in the dnt_phrases field while configuring a request from the client. You can also exercise translation exclusion using this field by simply putting a word without the double-hash ## symbol on a new line in the input dictionary file, and it will appear untranslated in the output.

Speech-to-Speech Translation (S2S)#

NVIDIA Riva Speech-to-Speech Translation (S2S) service translates audio between language pairs, that is, from one source language to another target language. S2S takes an audio stream or audio buffer as input and returns a generated audio file. The Riva S2S service is composed of Riva ASR, NMT, and TTS pipelines internally. The Riva S2S service supports streaming mode. Bilingual and multilingual models are trained using NVIDIA NeMo; a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Machine Translation (NMT), and Text-to-Speech (TTS) models.

S2S Models Supported#

The S2S feature supports the following models for ASR, NMT, and TTS.

Speech-to-Text Translation (S2T)#

The NVIDIA Riva Speech-to-Text Translation (S2T) service transcribes audio to text between given language pairs, that is, from a source language to a target language. S2T takes an audio stream or audio buffer as input and returns a transcription. The Riva S2T service is composed of Riva ASR and NMT pipelines internally and supports streaming mode. Bilingual and multilingual models are trained using NVIDIA NeMo; a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR) and Natural Machine Translation (NMT) models.

S2T Models Supported#

The S2T feature supports the following models for ASR and NMT.

Model Deployment#

Like all Riva models, Riva S2S and S2T requires the following steps:

Create .riva files for each model from a .nemo file as outlined in the NeMo section.
Create .rmir files for each Riva Speech AI Skill using riva-build.
Create model directories using riva-deploy.
Deploy the model directory using riva_server.

Models can be customized as shown in ASR Customization, NMT Custom Models, and TTS Custom Models.

Multiple Deployed Models#

The Riva server supports multiple models deployed simultaneously; up to the limit of your GPUs memory. As such, a single-server process can host models for a variety of language pairs as outlined above.

For the text translation client, the model name can be provided by using the --model_name parameter of the client request. This value must match the value of the riva-build parameter used to create the model. If a model name is not provided, it will be derived automatically from the provided source and target language pair.

To get models and language pairs available on the server, use the ListSupportedLanguagePairs API.

When receiving requests from the client application, the Riva server selects the deployed models to use based on the protobuf object StreamingTranslateSpeechToSpeechConfig (for S2S) and StreamingTranslateSpeechToTextConfig (for S2T) of the client request. In the case where multiple models might be able to fulfill the client request, one model is selected at random.

Punctuation and Inverse Text Normalization (ITN) with S2S and S2T#

The S2S and S2T services support punctuation and ITN. They can be enabled or disabled with the following parameters in the client options:

--automatic_punctuation when set to true(default) enables punctuation and --verbatim_transcripts when set to false enables ITN.

BLEU Metric#

The BLEU score evaluates the quality of the Riva pipeline.

The pipeline has a BLEU score of 27 when punctuation and ITN are enabled.

The pipeline has a BLEU score of 21.5 when punctuation and ITN are disabled.

NVIDIA Riva

Translation Overview

Contents