Using Custom Translation Dictionaries#
Control how the NMT NIM translates domain-specific terms by using custom dictionaries and <dnt> (do not translate) exclusion tags. This is essential for applications that handle branded content, medical terminology, legal terms, or product catalogs.
Prerequisites#
An NMT NIM container deployed and ready. Refer to Deploy and Run NMT NIM.
Installed the NVIDIA Riva Python client.
Custom Dictionary Files#
For repeatable control across many requests, create a dictionary file that maps source terms to specific translations or marks terms as untranslatable.
Dictionary File Format#
Create a plain text file with one entry per line.
Force a specific translation:
source_word##target_wordBlock translation (leave unchanged):
word(no##)
NVIDIA##NVIDIA
Kubernetes##Kubernetes
inference##Inferenz
microservice##Mikrodienst
In this example, “NVIDIA” and “Kubernetes” pass through unchanged, while “inference” translates to “Inferenz” and “microservice” translates to “Mikrodienst” regardless of what the model would produce.
Using a Dictionary File#
Pass the dictionary file with the --dnt-phrases-file flag.
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "Deploy the NVIDIA microservice for inference on Kubernetes." \
--source-language-code en-US \
--target-language-code de-DE \
--dnt-phrases-file custom_dict.txt
The dictionary rules apply to every input in the request, including batched inputs.
When to Use Dictionary Files#
Enterprise applications with a fixed glossary of product names, acronyms, and domain terms.
Batch translation pipelines where the same terminology rules apply to all inputs.
Scenarios requiring both forced translations and untranslatable terms.
Real-World Examples#
Medical Translation#
Protect drug names and force standardized medical term translations.
Aspirin##Aspirin
COVID-19##COVID-19
MRI##MRT
CT scan##CT-Untersuchung
blood pressure##Blutdruck
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "The patient received Aspirin after the MRI showed no issues." \
--source-language-code en-US \
--target-language-code de-DE \
--dnt-phrases-file medical_dict.txt
Product Catalog Translation#
Protect product names and SKUs while translating descriptions.
NIM-PRO-100
NIM-LITE-50
TensorRT
Triton
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "NIM-PRO-100 accelerates inference using TensorRT and Triton." \
--source-language-code en-US \
--target-language-code ja \
--dnt-phrases-file product_dict.txt
All four terms pass through unchanged while the sentence structure and verbs are translated to Japanese.
Legal Document Translation#
Force consistent legal terminology across documents.
Force Majeure##Force Majeure
indemnification##Schadloshaltung
liability##Haftung
warranty##Gewährleistung
Using Dictionaries with Batch Translation#
Dictionary rules apply to all inputs in a batch. Create a text file with one input per line and combine it with the dictionary file.
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text-file product_descriptions.txt \
--source-language-code en-US \
--target-language-code fr \
--batch-size 8 \
--dnt-phrases-file product_dict.txt
Using Dictionaries Programmatically#
When calling the Python client API directly, pass dictionary entries as the dnt_phrases_dict parameter. This parameter accepts a Python dictionary where each key is the source term and the value is the target translation. Use an empty string as the value to leave the term untranslated.
import riva.client
auth = riva.client.Auth(uri="localhost:50051")
nmt_client = riva.client.NeuralMachineTranslationClient(auth)
response = nmt_client.translate(
texts=["Deploy the microservice for inference."],
model="",
source_language="en-US",
target_language="de-DE",
dnt_phrases_dict={
"microservice": "Mikrodienst",
"inference": "Inferenz",
},
)
print(response.translations[0].text)
The dictionary file format maps to the dnt_phrases_dict parameter as follows:
Dictionary File Entry |
Python Dict Entry |
|---|---|
|
|
|
|
For more programmatic usage patterns, refer to Translate with Python.
Best Practices#
Keep dictionary files small and focused. Large dictionaries with hundreds of entries can affect latency. Group entries by domain (medical, legal, product) and use the relevant file per request.
Test translations with and without the dictionary to verify that forced translations produce grammatically correct output in context.
Use
source##sourceto block translation when you want a term preserved exactly. This is equivalent to listing the word without##, but makes intent explicit.One entry per line. Blank lines and leading/trailing whitespace are ignored, but avoid inline comments or special characters outside the
source##targetformat.