Is this page helpful?

Run Your First NVIDIA Speech NIM Microservice for Neural Machine Translation#

In this tutorial, you will deploy a NVIDIA Speech NIM microservice for Neural Machine Translation (NMT) and use it to translate text between languages. You will learn how to control translation output using exclusion tags and custom dictionaries, and how to handle batched requests for throughput.

If you completed the ASR and TTS tutorials, the deployment workflow will be familiar. This tutorial focuses on NMT-specific concepts: language pairs, translation exclusion, custom dictionaries, and handling morphologically complex languages.

What You Learn#

By completing this tutorial, you:

Deploy an NMT NIM using the same container workflow you learned in previous tutorials.
Translate text between supported language pairs using the gRPC client.
Use <dnt> exclusion tags to protect terms (brand names, product names) from translation.
Create custom translation dictionaries to force or block specific word translations.
Understand batch inference for translating multiple inputs efficiently.
Know how to adjust --max-len-variation for morphologically complex languages like Arabic and Turkish.

What You Need#

A Linux system with a supported NVIDIA GPU (refer to the support matrix).
Completed setup: prerequisites and NGC access.
A terminal with Docker available and NGC_API_KEY exported.
Installed the NVIDIA Riva Python client.
Approximately 20-30 minutes (includes model download time on first run).

Key Concepts#

Concept	Description
Language pairs	NMT models translate between specific source and target languages identified by language codes (for example, `en-US` for English, `de-DE` for German, `fr-FR` for French). The client also accepts bare language codes, such as `en`, `de`, and `fr`. Both forms refer to the same language, and `--list-models` returns the bare form. Not every model supports every pair. Check the support matrix for supported languages.
Translation exclusion	Some terms should not be translated, such as brand names, product names, or technical terms. NMT supports `<dnt>` (do not translate) tags that protect enclosed text from modification.
Custom dictionaries	You can provide a dictionary file that forces specific translations or blocks translation of specific words. This is useful for domain-specific terminology.
`--max-len-variation`	Caps the output token count relative to the source: the decoder can generate at most `source_token_count + max-len-variation` tokens before it stops. Accepts an integer in the range `[0, 256]`. Defaults to `20`, which is enough for European languages where source and target token counts are similar. Increase it for target languages that tokenize into more pieces than the source, such as Arabic, Turkish, or Finnish.

Step 1: Deploy the NMT NIM Microservice#

This tutorial deploys the NVIDIA Riva Translate 1.6b model, which supports translation across multiple language pairs.

Tip

For other available NMT models and languages, refer to Supported Models.

export CONTAINER_ID=riva-translate-1_6b

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

To run the container headlessly (detached), replace -it --rm with -d --restart=unless-stopped:

docker run -d --restart=unless-stopped --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

Follow startup progress with docker logs -f $CONTAINER_ID and stop with docker stop $CONTAINER_ID && docker rm $CONTAINER_ID.

Unlike ASR and TTS, this deployment does not set NIM_TAGS_SELECTOR because this container has a single model. The container downloads the model from NGC on first run, which can take up to 30 minutes.

Step 2: Check Service Readiness#

Open a new terminal and verify the service is ready.

curl -X 'GET' 'http://localhost:9000/v1/health/ready'

Expected response:

{"status":"ready"}

Step 3: Basic Translation#

Translate a sentence from English to German.

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "This will become German words" \
  --source-language-code en-US \
  --target-language-code de-DE

Expected output:

## Das werden deutsche Wörter

You specify the source and target language codes, and the NIM handles the translation. The ## prefix in the output is from the NVIDIA Riva client script formatting.

For a full list of supported language codes, refer to the support matrix.

Step 4: Translation Exclusion#

In real applications, certain terms should remain untranslated: brand names, product names, technical terms. The <dnt> (do not translate) tag tells the NMT model to pass enclosed text through unchanged.

Without exclusion, all words are translated:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "Riva translate model translates audio between language pairs." \
  --source-language-code en-US \
  --target-language-code fr-FR

Le modèle de traduction Riva traduit l'audio entre les paires de langues.

With <dnt> exclusion tags around “Riva translate”, the term is protected:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "<dnt>Riva translate</dnt> model translates audio between language pairs." \
  --source-language-code en-US \
  --target-language-code fr-FR

Le modèle Riva translate traduit l'audio entre les paires de langues.

In the second output, “Riva translate” appears unchanged in the French text. The rest of the sentence is translated normally. This is essential for any application that handles branded or technical content.

Step 5: Custom Translation Dictionary#

For more granular control, you can provide a dictionary file that maps source words to specific target translations, or blocks translation of specific words entirely.

The dictionary is a text file with one entry per line.

source_word##target_word — Forces a specific translation.
word (no ##) — Leaves the word untranslated.

For the full file syntax (entry rules, comments, casing behavior) and additional examples, refer to Using Custom Translation Dictionaries.

First, refer to the default translation:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "bad morning everyone" \
  --source-language-code en-US \
  --target-language-code it-IT

brutto mattino tutti

Now create a custom dictionary that maps “bad” to “good” and leaves “everyone” untranslated:

echo bad##good > custom_dict.txt
echo everyone >> custom_dict.txt
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "bad morning everyone" \
  --source-language-code en-US \
  --target-language-code it-IT \
  --dnt-phrases-file custom_dict.txt

good mattina everyone

In the output, “bad” was replaced with “good” (not translated to Italian), “morning” was translated normally to “mattina”, and “everyone” was passed through unchanged. Custom dictionaries give you precise control over translation behavior for domain-specific terms.

Step 6: Batched Inference#

When translating many inputs, batch them for better throughput. Create a text file with one input per line and use the --batch-size flag.

First, create input_text.txt with a few sentences to translate:

cat > input_text.txt <<'EOF'
Deploy containers on Kubernetes.
Monitor GPU utilization with nvidia-smi.
Scale inference with Triton Inference Server.
Optimize models with TensorRT.
EOF

Then translate the batch:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text-file input_text.txt \
  --source-language-code en \
  --target-language-code de --batch-size 8

The NIM processes up to 8 inputs in parallel on the GPU, which is significantly faster than sending them one at a time. Use batching whenever you have multiple texts to translate.

Handling Long Inputs#

The NMT model accepts up to approximately 508 tokens per input (the encoder’s positional-embedding bound). For longer content, split the source text on sentence boundaries with one sentence per line and pass it using --text-file. The client sends each line as a separate batch item, and the model translates each line independently.

cat > long_input.txt <<'EOF'
Despite numerous challenges, the team encountered hardware failures and supply-chain disruptions.
Meaningful progress was still attainable through careful planning and persistent effort.
The final report documented every milestone and setback.
EOF

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text-file long_input.txt \
  --source-language-code en \
  --target-language-code de --batch-size 4

The same long input passed as a single --text "<long-input>" string, even with embedded newlines, is treated as one item by the API and is truncated if it exceeds the per-input token limit. Always split into a list or file when individual segments might exceed approximately 508 tokens.

Step 7: Morphologically Complex Languages#

Some target languages, such as Arabic, Turkish, and Finnish, produce longer output than the source text. The decoder stops when it has generated source_token_count + max-len-variation tokens, so the output is truncated if the budget is too small for the target language. Increase --max-len-variation to give the decoder more headroom.

Valid range: integer in [0, 256]. Default: 20.

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
  --text "Despite numerous challenges faced by the international community in coordinating an effective response to climate change, several countries have committed to achieving net-zero emissions by 2050." \
  --source-language-code en-US \
  --target-language-code ar-AR \
  --max-len-variation 150

Lower values can truncate output. Higher values allow longer translations but can increase latency. Adjust based on your language pair and text length. This parameter is typically not needed for European languages.

What You Learned#

In this tutorial, you have learned the following:

Deployed an NMT NIM using the same container workflow as ASR and TTS, reinforcing the pattern that all Speech NIMs share.
Translated between language pairs using language codes, and learned where to find supported pairs.
Protected terms from translation using <dnt> exclusion tags for brand names and technical terms.
Created custom dictionaries to force specific translations or block translation of specific words, giving you fine-grained control over output.
Used batched inference to translate multiple inputs efficiently on the GPU.
Handled morphologically complex languages by adjusting --max-len-variation to prevent output truncation.

Next Steps#

Composing Pipelines: Learn how to chain ASR, NMT, and TTS NIMs together to build a speech-to-speech translation pipeline.
NMT Developer Guide: Explore NMT capabilities in depth.
Deploy NMT with Helm: Kubernetes deployment for production.