Optimization for NVIDIA NIM for Image OCR (NeMo Retriever OCR)#
Use this documentation to learn about optimization for NVIDIA NIM for Image OCR (NeMo Retriever OCR).
NVIDIA NIM for Image OCR (NeMo Retriever OCR) (Image OCR NIM (NeMo Retriever OCR)) automatically leverages model- and hardware-specific optimizations intended to improve the performance of the models.
The NIM uses the TensorRT backend for Triton Inference Server for optimized inference of common models across a number of NVIDIA GPUs. If an optimized engine does not exist for a SKU being used, a GPU-agnostic backend, such as ONNX or PyTorch, is used instead.
The NIM includes multiple optimization profiles, catered to floating point precision types supported by each SKU.
For supported GPUs, nemotron-ocr-v2 uses FP16 TensorRT optimized profiles. If no compatible TensorRT profile is available, the NIM falls back to the FP16 PyTorch profile.
For OCR v2, TensorRT provides the best throughput and latency on supported GPUs. The PyTorch profile can provide better multilingual OCR accuracy on NVIDIA A10G and NVIDIA L4 GPUs, with lower throughput and higher latency than TensorRT. For details, refer to Use PyTorch on A10G and L4 GPUs.
Automatic Profile Selection#
Image OCR NIM (NeMo Retriever OCR) is designed to automatically select the most suitable profile from the list of compatible profiles based on the detected hardware. Each profile consists of different parameters that influence the selection process. The automatic selection process considers the following factors:
Compatibility check: Automatic selection excludes the profiles that are not runnable with the detected configuration based on the number of GPUs and GPU model.
Backend: The backend is TensorRT, ONNX, or PyTorch. Automatic selection prefers the optimized TensorRT profiles.
Precision: Automatic selection prefers lower precision profiles. For example, profiles with FP8 are selected before FP16 when available.
The model profile is logged at startup. For example, in the log you should see something similar to the following.
INFO 2025-10-14 20:21:09.034 nim_sdk.py:299] Using the profile selected by the profile selector: <profile-id>
Override Profile Selection#
To override automatic profile selection, set a specific profile ID in the docker run command.
First, use the following command to list the available profiles.
export IMG_NAME="nvcr.io/nim/nvidia/nemotron-ocr-v2:1.4.0"
docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles
Note
To run list-model-profiles on a host with no available GPUs, include -e NIM_CPU_ONLY=1 in the docker run command. $IMG_NAME is the URI of the Docker container image for the NIM. For more information, see Get Started With NVIDIA NIM for Image OCR (NeMo Retriever OCR).
You should see output similar to the following.
MODEL PROFILES
- Compatible with system and runnable:
- <profile-id> (precision: fp16, backend: triton, model_type: pytorch)
- <profile-id> (precision: fp16, backend: triton, compute_capability: 8.9, model_type: tensorrt)
- <profile-id> (precision: fp16, backend: triton, compute_capability: 12.0, model_type: tensorrt)
...
Next, add -e NIM_MODEL_PROFILE="${PROFILE_ID}" to the docker run command to use the specific profile that you want.
For example, run the following code.
export PROFILE_ID="<profile-id>"
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
docker run -it --rm --runtime=nvidia --gpus=all \
--shm-size=16GB \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE="${PROFILE_ID}" \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
Use PyTorch on A10G and L4 GPUs#
In release 1.4.0, the multilingual OCR v2 TensorRT profiles for NVIDIA A10G and NVIDIA L4 GPUs can produce lower accuracy than the PyTorch profile. Other supported TensorRT profiles are not affected.
By default, the NIM selects an available TensorRT profile because TensorRT typically provides higher throughput and lower latency. To prioritize multilingual OCR accuracy over performance, set the NIM_MODEL_PROFILE environment variable to a PyTorch profile. To return to TensorRT, remove NIM_MODEL_PROFILE and let automatic profile selection choose the default TensorRT profile, or set NIM_MODEL_PROFILE to an available TensorRT profile.
To set the NIM_MODEL_PROFILE environment variable to a PyTorch profile, use the following procedure.
Run
list-model-profileswith the container image that you plan to deploy.export IMG_NAME="nvcr.io/nim/nvidia/nemotron-ocr-v2:1.4.0" docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles
Note
To run
list-model-profileson a host with no available GPUs, include-e NIM_CPU_ONLY=1in thedocker runcommand. Profile IDs are release-specific. Verify the profile IDs withlist-model-profilesfor the exact container image that you deploy.Copy the profile ID whose output includes
model_type: pytorch.When you run the docker command to start the container, set
NIM_MODEL_PROFILEto the profile ID that you copied in the previous step.export IMG_NAME="nvcr.io/nim/nvidia/nemotron-ocr-v2:1.4.0" export MANUAL_PROFILE_ID="the PyTorch profile ID that you copied in the previous step" export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" docker run -it --rm --runtime=nvidia --gpus=all \ --shm-size=16GB \ -e NGC_API_KEY \ -e NIM_MODEL_PROFILE="${MANUAL_PROFILE_ID}" \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAME
Note
For Helm deployments, add
NIM_MODEL_PROFILEtoenvVars.envVars: NIM_MODEL_PROFILE: "the PyTorch profile ID that you copied in the previous step"
To return to automatic TensorRT selection, remove the NIM_MODEL_PROFILE environment variable from the Docker command or Helm values file and redeploy the NIM.
To set the NIM_MODEL_PROFILE environment variable to a specific TensorRT profile, use the following procedure.
Run
list-model-profileswith the container image that you plan to deploy.export IMG_NAME="nvcr.io/nim/nvidia/nemotron-ocr-v2:1.4.0" docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles
Note
To run
list-model-profileson a host with no available GPUs, include-e NIM_CPU_ONLY=1in thedocker runcommand. Profile IDs are release-specific. Verify the profile IDs withlist-model-profilesfor the exact container image that you deploy.Copy the profile ID whose output includes
model_type: tensorrt.When you run the docker command to start the container, set
NIM_MODEL_PROFILEto the profile ID that you copied in the previous step.export IMG_NAME="nvcr.io/nim/nvidia/nemotron-ocr-v2:1.4.0" export MANUAL_PROFILE_ID="the TensorRT profile ID that you copied in the previous step" export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" docker run -it --rm --runtime=nvidia --gpus=all \ --shm-size=16GB \ -e NGC_API_KEY \ -e NIM_MODEL_PROFILE="${MANUAL_PROFILE_ID}" \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAME
Note
For Helm deployments, add
NIM_MODEL_PROFILEtoenvVars.envVars: NIM_MODEL_PROFILE: "the TensorRT profile ID that you copied in the previous step"