Deploy and Run the NMT NIM Microservice#
Deploy the Riva Translate 1.6b model as an NMT NIM container and run text translation between 36 languages.
For model details, GPU requirements, and supported languages, refer to the NMT support matrix.
Prerequisites#
Completed prerequisites and NGC access setup.
Installed the NVIDIA Riva Python client.
NGC_API_KEYexported in your terminal.
Deploy the NIM Container#
The NMT NIM has a single model, so NIM_TAGS_SELECTOR is not required for Docker deployment. The container automatically selects the best model profile for your GPU.
export CONTAINER_ID=riva-translate-1_6b
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
On first startup, the container downloads the model from NGC, which can take up to 30 minutes depending on network speed. A pre-built TensorRT engine is downloaded when available for the target GPU. Otherwise, the container generates an optimized engine from the RMIR model on-the-fly, which adds additional startup time.
Tip
Mount a local cache directory to avoid repeated downloads. See Model Caching.
Verify Readiness#
Wait for the container to finish model setup, then check the health endpoint.
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
Expected response:
{"status":"ready"}
Run Translation Inference#
Basic Translation (gRPC)#
Translate text from English to German using the Riva Python client.
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "This will become German words" \
--source-language-code en-US \
--target-language-code de-DE
Expected output:
## Das werden deutsche Wörter
The ## prefix is formatting from the Riva client script.
Batch Translation#
Translate multiple inputs from a text file (one input per line).
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text-file input_text.txt \
--source-language-code en \
--target-language-code de \
--batch-size 8
The NIM processes up to 8 inputs in parallel on the GPU.
Morphologically Complex Languages#
For target languages that produce longer output (Arabic, Turkish, Finnish), increase --max-len-variation to prevent truncation.
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "Despite numerous challenges, several countries committed to net-zero by 2050." \
--source-language-code en-US \
--target-language-code ar-AR \
--max-len-variation 150
Default is 20 (range: 0-256). Higher values allow longer translations but can increase latency.
Client Parameters Reference#
--text and --text-file are mutually exclusive. Use one or the other.
Parameter |
Description |
Default |
|---|---|---|
|
gRPC server address and port. |
|
|
Text to translate. Mutually exclusive with |
— |
|
Path to a file with one input per line. Mutually exclusive with |
— |
|
Source language code (for example, |
|
|
Target language code (for example, |
|
|
Number of inputs to translate in parallel when using |
|
|
Maximum token count difference between source and output (0-256). |
|
|
Path to a custom dictionary file. Refer to Custom Dictionaries. |
— |
|
Model name to use for translation. |
|
|
List available models and supported language pairs, then exit. |
— |
For the full list of supported language codes, refer to the NMT support matrix.
List Available Models#
To refer to which models and language pairs are available on the running NMT NIM:
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 --list-models
Helm Deployment#
For Kubernetes deployment, create a custom-values.yaml file:
image:
repository: nvcr.io/nim/nvidia/riva-translate-1_6b
pullPolicy: IfNotPresent
tag: latest
nim:
ngcAPISecret: ngc-api
imagePullSecrets:
- name: ngc-secret
envVars:
NIM_TAGS_SELECTOR: name=riva-translate-1_6b
For complete Helm instructions, refer to Deploying with Helm.
Next Steps#
Custom Dictionaries: Force specific translations or block translation of domain terms.
Translate with Python: Call the NMT gRPC API programmatically.
NMT Troubleshooting: Common issues and solutions.