Deploy and Run the NMT NIM Microservice#
Deploy the Riva Translate 1.6b model as an NMT NIM container and run text translation between 36 languages.
For model details, GPU requirements, and supported languages, refer to the NMT support matrix.
Prerequisites#
Completed prerequisites and NGC access setup.
Installed the NVIDIA Riva Python client.
NGC_API_KEYexported in your terminal.
Deploy the NIM Container#
For the container image, refer to the NGC catalog.
The NMT NIM has a single model, so NIM_TAGS_SELECTOR is not required for Docker deployment. The container automatically selects the best model profile for your GPU.
export CONTAINER_ID=riva-translate-1_6b
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
For headless deployments (for example, on a remote server or in CI), replace -it --rm with -d --restart=unless-stopped:
docker run -d --restart=unless-stopped --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Follow startup progress with docker logs -f $CONTAINER_ID and stop with docker stop $CONTAINER_ID && docker rm $CONTAINER_ID.
On first startup, the container downloads the model from NGC, which can take up to 30 minutes depending on network speed. A pre-built TensorRT engine is downloaded when available for the target GPU. Otherwise, the container generates an optimized engine from the RMIR model on-the-fly, which adds additional startup time.
Tip
Mount a local cache directory to avoid repeated downloads. See Model Caching.
Verify Readiness#
Wait for the container to finish model setup, then check the health endpoint.
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
Expected response:
{"status":"ready"}
Run Translation Inference#
Basic Translation (gRPC)#
Translate text from English to German using the Riva Python client.
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "This will become German words" \
--source-language-code en-US \
--target-language-code de-DE
Expected output:
## Das werden deutsche Wörter
The ## prefix is formatting from the Riva client script.
Batch Translation#
Translate multiple inputs from a text file (one input per line). First, create input_text.txt:
cat > input_text.txt <<'EOF'
Deploy containers on Kubernetes.
Monitor GPU utilization with nvidia-smi.
Scale inference with Triton Inference Server.
Optimize models with TensorRT.
EOF
Then translate the batch:
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text-file input_text.txt \
--source-language-code en \
--target-language-code de \
--batch-size 8
The NIM processes up to 8 inputs in parallel on the GPU.
Handling Long Inputs#
The NMT model accepts up to approximately 508 tokens per input (the encoder’s positional-embedding bound). For longer content, split the source into sentences with one sentence per line in a file or one element per item in the gRPC texts: repeated string field. Sending a single string longer than this limit can result in truncated output or an INVALID_ARGUMENT error.
# Sentence-per-line file works for arbitrary total length
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text-file long_paragraph_split.txt \
--source-language-code en --target-language-code de \
--batch-size 4
Morphologically Complex Languages#
--max-len-variation controls the decoder’s output-length budget. The decoder stops after it emits source_token_count + max-len-variation tokens. For target languages that tokenize into more pieces than the source, such as Arabic, Turkish, and Finnish, the default budget can be too small and the output is silently truncated. Increase the value to give the decoder more headroom.
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 \
--text "Despite numerous challenges, several countries committed to net-zero by 2050." \
--source-language-code en-US \
--target-language-code ar-AR \
--max-len-variation 150
Valid range: integer in [0, 256]. Default: 20. Higher values allow longer translations but can increase latency.
Translation with HTTP REST#
In addition to gRPC, the NMT NIM exposes an HTTP REST endpoint for text-to-text translation on the port set by NIM_HTTP_API_PORT (default 9000). This endpoint is useful for clients that cannot use gRPC, for quick curl-based testing, and for parity with the TTS NIM HTTP API.
POST /v1/text/translations#
Request body (application/json):
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Input text to translate. Must be non-empty. |
|
string |
Yes |
Source language code (for example, |
|
string |
Yes |
Target language code (for example, |
|
string |
No |
NMT model name. Defaults to the deployed model. |
|
object |
No |
Do-not-translate phrases as a |
|
string |
No |
Decoder length-variation budget, integer in |
Example:
curl -X POST http://localhost:9000/v1/text/translations \
-H 'Content-Type: application/json' \
-d '{
"text": "This will become German words.",
"source_language": "en-US",
"target_language": "de-DE"
}'
Response:
{
"translation": "Das werden deutsche Wörter.",
"language": "de-DE"
}
GET /v1/text/translations/languages#
List supported source and target languages per deployed model. Pass an optional ?model=<name> query parameter to filter to a specific model.
curl http://localhost:9000/v1/text/translations/languages
Client Parameters Reference#
--text and --text-file are mutually exclusive. Use one or the other.
Parameter |
Description |
Default |
|---|---|---|
|
gRPC server address and port. |
|
|
Text to translate. Mutually exclusive with |
— |
|
Path to a file with one input per line. Mutually exclusive with |
— |
|
Source language code (for example, |
|
|
Target language code (for example, |
|
|
Number of inputs to translate in parallel when using |
|
|
Maximum token count difference between source and output (0-256). |
|
|
Path to a custom dictionary file. Refer to Custom Dictionaries. |
— |
|
Model name to use for translation. |
|
|
List available models and supported language pairs, then exit. |
— |
For the full list of supported language codes, refer to the NMT support matrix.
List Available Models#
To refer to which models and language pairs are available on the running NMT NIM:
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 --list-models
Helm Deployment#
For Kubernetes deployment, create a custom-values.yaml file:
image:
repository: nvcr.io/nim/nvidia/riva-translate-1_6b
pullPolicy: IfNotPresent
tag: latest
nim:
ngcAPISecret: ngc-api
imagePullSecrets:
- name: ngc-secret
envVars:
NIM_TAGS_SELECTOR: name=riva-translate-1_6b
For complete Helm instructions, refer to Deploying with Helm.
Next Steps#
Custom Dictionaries: Force specific translations or block translation of domain terms.
Translate with Python: Call the NMT gRPC API programmatically.
NMT Troubleshooting: Common issues and solutions.