Translation Overview#

NVIDIA Riva translation is a framework based on neural networks. Riva translation translates text between language pairs, that is, from one language to another. For example, we want a machine to translate text in one language (we call this the source language), to corresponding text in another language (we call this the target language). Bilingual and multilingual models are trained using NVIDIA NeMo; a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models.

For more information about the translation model architecture and training, refer to the NeMo Machine Translation documentation.

Language Pairs Supported#

The NVIDIA Riva translation service supports models for these language pairs:

  1. English (en) to German (de), Spanish (es), French (fr)

  2. German (de), Spanish (es), French (fr) to English (en)

  3. English (en) to Simplified Chinese (zh)

  4. Simplified Chinese (zh) to English (en)

  5. English (en) to Russian (ru)

  6. Russian (ru) to English (en)

  7. English (en) to German (de)

  8. German (de) to English (en)

  9. English (en) to Spanish (es)

  10. Spanish (es) to English (en)

  11. English (en) to French (fr)

  12. French (fr) to English (en)

Translation Features#

Riva translation currently provides an API to translate to and from language pairs using models trained in NeMo Machine Translation. There are two different model architectures supported:

  • Multilingual models like en_deesfr, which translates from English to German, Spanish, and French. Multilingual models have several language codes in their name. By default, use 24x6 multilingual models. Use a multilingual model if you need to support multiple languages or if you want to optimize resource utilization since you can translate along multiple language pairs without loading multiple bilingual models. Running multilingual models prevents loading multiple models, therefore, preventing overhead.

  • Bilingual models like en_ru, which translates from English to Russian. Bilingual models have a single pair of language codes in their name. Use a bilingual model when you want the best possible performance for a specific language pair direction. Running bilingual models produces better quality translations compared to the current multilingual models.

You can use 12x2 instead of a 24x6 multilingual model if you need to reduce the resource consumption even further to reduce latency and can accept a bit of translation quality degradation.


You can switch to bilingual models if you feel the multilingual models performance is lacking on a specific language direction.

Riva translation enables you to batch multiple sentences together to provide a faster translation experience. Using the translation client, you can batch together up to 8 sentences to be translated in a single request. The batch size, which defaults to 8, can be adjusted using the batch_size parameter in the client.

Multiple Deployed Models#

The Riva server supports multiple translation models deployed simultaneously, up to the limit of your GPUs memory. As such, a single-server process can host models for a variety of language pairs as outlined above.

The model is determined by the model parameter of the client request. This value must match the value of the riva-build parameter used to create the model.

If a model name is not provided, the default name is riva-nmt, which is also the default name used by riva-build. If you specify a model name that is already used, it will be overwritten.

To get models and language pairs available on the server, use the ListSupportedLanguagePairs API.