Run Your First NVIDIA Speech NIM Microservice for Neural Machine Translation#

In this tutorial, you will deploy a NVIDIA Speech NIM microservice for Neural Machine Translation (NMT) and use it to translate text between languages. You will learn how to control translation output using exclusion tags and custom dictionaries, and how to handle batched requests for throughput.

If you completed the ASR and TTS tutorials, the deployment workflow will be familiar. This tutorial focuses on NMT-specific concepts: language pairs, translation exclusion, custom dictionaries, and handling morphologically complex languages.

What You Learn#

By completing this tutorial, you:

  • Deploy an NMT NIM using the same container workflow you learned in previous tutorials.

  • Translate text between supported language pairs using the gRPC client.

  • Use <dnt> exclusion tags to protect terms (brand names, product names) from translation.

  • Create custom translation dictionaries to force or block specific word translations.

  • Understand batch inference for translating multiple inputs efficiently.

  • Know how to adjust --max-len-variation for morphologically complex languages like Arabic and Turkish.

What You Need#

Key Concepts#

Concept

Description

Language pairs

NMT models translate between specific source and target languages identified by language codes (for example, en-US for English, de-DE for German, fr-FR for French). Not every model supports every pair. Check the support matrix for supported languages.

Translation exclusion

Some terms should not be translated, such as brand names, product names, or technical terms. NMT supports <dnt> (do not translate) tags that protect enclosed text from modification.

Custom dictionaries

You can provide a dictionary file that forces specific translations or blocks translation of specific words. This is useful for domain-specific terminology.

Step 1: Deploy the NMT NIM Microservice#

This tutorial deploys the NVIDIA Riva Translate 1.6b model, which supports translation across multiple language pairs.

Tip

For other available NMT models and languages, refer to Supported Models.

export CONTAINER_ID=riva-translate-1_6b

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

Unlike ASR and TTS, this deployment does not set NIM_TAGS_SELECTOR because this container has a single model. The container downloads the model from NGC on first run, which can take up to 30 minutes.

Step 2: Check Service Readiness#

Open a new terminal and verify the service is ready.

curl -X 'GET' 'http://localhost:9000/v1/health/ready'

Expected response:

{"status":"ready"}

Step 3: Basic Translation#

Translate a sentence from English to German.

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "This will become German words" \
  --source-language-code en-US \
  --target-language-code de-DE

Expected output:

## Das werden deutsche Wörter

You specify the source and target language codes, and the NIM handles the translation. The ## prefix in the output is from the NVIDIA Riva client script formatting.

For a full list of supported language codes, refer to the support matrix.

Step 4: Translation Exclusion#

In real applications, certain terms should remain untranslated: brand names, product names, technical terms. The <dnt> (do not translate) tag tells the NMT model to pass enclosed text through unchanged.

Without exclusion, all words are translated:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "Riva translate model translates audio between language pairs." \
  --source-language-code en-US \
  --target-language-code fr-FR
Le modèle de traduction Riva traduit l'audio entre les paires de langues.

With <dnt> exclusion tags around “Riva translate”, the term is protected:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "<dnt>Riva translate</dnt> model translates audio between language pairs." \
  --source-language-code en-US \
  --target-language-code fr-FR
Le modèle Riva translate traduit l'audio entre les paires de langues.

In the second output, “Riva translate” appears unchanged in the French text. The rest of the sentence is translated normally. This is essential for any application that handles branded or technical content.

Step 5: Custom Translation Dictionary#

For more granular control, you can provide a dictionary file that maps source words to specific target translations, or blocks translation of specific words entirely.

The dictionary is a text file with one entry per line.

  • source_word##target_word — Forces a specific translation.

  • word (no ##) — Leaves the word untranslated.

First, refer to the default translation:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "bad morning everyone" \
  --source-language-code en-US \
  --target-language-code it-IT
brutto mattino tutti

Now create a custom dictionary that maps “bad” to “good” and leaves “everyone” untranslated:

echo bad##good > custom_dict.txt
echo everyone >> custom_dict.txt
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "bad morning everyone" \
  --source-language-code en-US \
  --target-language-code it-IT \
  --dnt-phrases-file custom_dict.txt
good mattina everyone

In the output, “bad” was replaced with “good” (not translated to Italian), “morning” was translated normally to “mattina”, and “everyone” was passed through unchanged. Custom dictionaries give you precise control over translation behavior for domain-specific terms.

Step 6: Batched Inference#

When translating many inputs, batch them for better throughput. Create a text file with one input per line and use the --batch-size flag.

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text-file input_text.txt \
  --source-language-code en \
  --target-language-code de --batch-size 8

The NIM processes up to 8 inputs in parallel on the GPU, which is significantly faster than sending them one at a time. Use batching whenever you have multiple texts to translate.

Step 7: Morphologically Complex Languages#

Some target languages (Arabic, Turkish, Finnish) produce longer output than the source text. If the output is truncated, increase --max-len-variation (default: 20, range: 0-256).

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "Despite numerous challenges faced by the international community in coordinating an effective response to climate change, several countries have committed to achieving net-zero emissions by 2050." \
  --source-language-code en-US \
  --target-language-code ar-AR \
  --max-len-variation 150

Lower values can truncate output. Higher values allow longer translations but can increase latency. Adjust based on your language pair and text length. This parameter is typically not needed for European languages.


What You Learned#

In this tutorial, you have learned the following:

  • Deployed an NMT NIM using the same container workflow as ASR and TTS, reinforcing the pattern that all Speech NIMs share.

  • Translated between language pairs using language codes, and learned where to find supported pairs.

  • Protected terms from translation using <dnt> exclusion tags for brand names and technical terms.

  • Created custom dictionaries to force specific translations or block translation of specific words, giving you fine-grained control over output.

  • Used batched inference to translate multiple inputs efficiently on the GPU.

  • Handled morphologically complex languages by adjusting --max-len-variation to prevent output truncation.


Next Steps#