Get Started With NVIDIA NIM for Image OCR (NeMo Retriever OCR)#
This documentation helps you get started with NVIDIA NIM for Image OCR (NeMo Retriever OCR).
Prerequisites#
Before you can get started, you need the following:
Verify that you have supported hardware and software. For details, refer to the support matrix.
If you are running on an RTX AI PC or Workstation, install WSL2. For instructions, refer to NIM on WSL2 documentation.
Create an account on NVIDIA NGC and generate an API key to access the NIM container images and model assets on NGC. For instructions, refer to the NGC Authentication section that follows.
Note
Deploying on Kubernetes is not supported for WSL.
NGC Authentication#
Generate your API key#
To access the NIM container images and model assets on NGC, you must generate a personal API key. To create your key, go to https://org.ngc.nvidia.com/setup/api-keys.
When you create your key, for Services Included, select the following:
NGC Catalog
Private Registry (if you are an Early Access participant)
You can include more services if you are going to use this key for other purposes. For more information, refer to the NGC User Guide.
Export the API key#
To conveniently use your API key in the commands in the following sections, you can export your key as an environment variable named NGC_API_KEY.
For example, run the following code in your terminal.
export NGC_API_KEY=<your API key value>
Run one of the following commands to make your key available when you start a new terminal session.
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Note
Other, more secure options include saving your API key value in a file (retrieve it by using cat $NGC_API_KEY_FILE), or saving your key in a password manager.
Docker Login to NGC#
Before you can pull the NIM container image from NGC, first authenticate to NGC by using the following command.
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use $oauthtoken as the username and NGC_API_KEY as the password.
The $oauthtoken username is a special name that indicates that you will authenticate with an API key and not a user name and password.
Accept the License Terms#
Some NIM models require that you accept the license terms on NGC before you can pull the container image and model assets. To accept the license terms, browse to the model or container page on the NGC Catalog, read and then click Accept Terms.
For example, accept terms for the following container:
Launch the NIM#
The following command launches a Docker container for the nemotron-ocr-v1 model. For Docker versions >= 19.03, the --runtime=nvidia option has the same effect as the --gpus all option.
# Choose a container name for bookkeeping
export NIM_MODEL_NAME=nvidia/nemotron-ocr-v1
export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)
# Choose a NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/$CONTAINER_NAME:1.3.0"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# Start the NIM
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
Flags |
Description |
|---|---|
|
|
|
Delete the container after it stops (see Docker docs). |
|
Give a name to the NIM container for bookkeeping (here |
|
Ensure NVIDIA drivers are accessible in the container. |
|
Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs. |
|
Allocate host memory for multi-GPU communication. Not required for single GPU models or GPUs with NVLink enabled. |
|
Provide the container with the token necessary to download adequate models and resources from NGC. See above. |
|
Mount a cache directory from your system ( |
|
Use the same user as your system user inside the NIM container to avoid permission mismatches when downloading models in your local cache directory. |
|
Forward the port where the NIM server is published inside the container to access from the host system. The left-hand side of |
|
Name and version of the NIM container from NGC. The NIM server automatically starts if no argument is provided after this. |
If you have an issue with permission mismatches when downloading models in your local cache directory, add the
-u $(id -u)option to thedocker runcall to run under your current identity.
If you are running on a host with different types of GPUs, you should specify GPUs of the same type using the
--gpusargument todocker run. For example,--gpus '"device=0,2"'. The device IDs of 0 and 2 are examples only; replace them with the appropriate values for your system. Device IDs can be found by runningnvidia-smi. More information can be found GPU Enumeration.
GPU clusters with GPUs in Multi-instance GPU mode (MIG), are currently not supported.
Verify that the Service is Ready#
After you launch the NIM, it might take a few seconds for the service to be ready to accept requests. To verify that the service is ready, run the following code.
curl -X 'GET' 'http://localhost:8000/v1/health/ready'
If the service is ready, you should see a response similar to the following.
{
"ready":true,
}
Run Inference#
After the service is ready, use code similar to the following to run inference. For more information, refer to API Reference for NVIDIA NIM for Image OCR (NeMo Retriever OCR).
API_ENDPOINT="http://localhost:8000"
# Create JSON payload with base64 encoded image
IMAGE_SOURCE="https://assets.ngc.nvidia.com/products/api-catalog/nemo-retriever/image-ocr/example-1.png"
# IMAGE_SOURCE="path/to/your/image.jpg" # Uncomment to use a local file instead
# Encode the image to base64 (handles both URLs and local files)
if [[ $IMAGE_SOURCE == http* ]]; then
# Handle URL
BASE64_IMAGE=$(curl -s ${IMAGE_SOURCE} | base64 -w 0)
else
# Handle local file
BASE64_IMAGE=$(base64 -w 0 ${IMAGE_SOURCE})
fi
# Construct the full JSON payload
JSON_PAYLOAD='{
"input": [{
"type": "image_url",
"url": "data:image/jpeg;base64,'${BASE64_IMAGE}'"
}],
"merge_levels": ["word"]
}'
# Send POST request to inference endpoint
echo "${JSON_PAYLOAD}" | \
curl -X POST "${API_ENDPOINT}/v1/infer" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d @-
Deploy on Multiple GPUs#
The NIM deploys a single model across however many GPUs that you specify and are visible inside the docker container. If you do not specify the number of GPUs, the NIM defaults to one GPU. When using multiple GPUs, Triton distributes inference requests across the GPUs to keep them equally utilized.
Use the docker run --gpus command-line argument to specify the number of GPUs that are available for deployment.
Example using all GPUs:
docker run --gpus all ...
Example using two GPUs:
docker run --gpus 2 ...
Example using specific GPUs:
docker run --gpus '"device=1,2"' ...
Deploy Alongside Other NIMs on the Same GPU#
You can deploy Image OCR NIM (NeMo Retriever OCR) alongside another NIM (for example, llama3-8b-instruct) on the same GPU (for example, A100 80GB, A100 40GB, or H100 80GB).
For more information about deployment, see Launch LLM NIMs from NGC, NIM Operator, GPU Operator with MIG, and Time-Slicing GPUs in Kubernetes.
Use the docker run --gpus command-line argument to specify the same GPU as shown in the following code.
docker run --gpus '"device=1"' ... $IMG_NAME
docker run --gpus '"device=1"' ... $LLM_IMG_NAME
Download NIM Models to Cache#
If model assets must be pre-fetched, such as in an air-gapped system, you can download the assets to the NIM cache without starting the server.
To download assets first run list-model-profiles to determine the desired profile, and then run download-to-cache with that profile, as shown following.
For details, see Optimization.
# Choose a container name for bookkeeping
export NIM_MODEL_NAME=nvidia/nemotron-ocr-v1
export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)
# Choose a NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/$CONTAINER_NAME:1.3.0"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# List NIM model profiles and select the most appropriate one for your use case
docker run -it --rm --name=$CONTAINER_NAME \
-e NIM_CPU_ONLY=1 \
-u $(id -u) \
$IMG_NAME list-model-profiles
export NIM_MODEL_PROFILE=<selected profile>
# Start the NIM container with a command to download the model to the cache
docker run -it --rm --name=$CONTAINER_NAME \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-e NIM_CPU_ONLY=1 \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
$IMG_NAME download-to-cache --profiles $NIM_MODEL_PROFILE
# Start the NIM container in an air-gapped environment and serve the model
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus=all \
--shm-size=16GB \
--network=none \
-v $LOCAL_NIM_CACHE:/mnt/nim-cache:ro \
-u $(id -u) \
-e NIM_CACHE_PATH=/mnt/nim-cache \
-e NGC_API_KEY \
-p 8000:8000 \
$IMG_NAME
By default, the download-to-cache command downloads the most appropriate model assets for the detected GPU. To override this behavior and download a specific model, set the NIM_MODEL_PROFILE environment variable when launching the container. Use the list-model-profiles command available within the NIM container to list all profiles. See Optimization for more details.
Stop the Container#
To stop the Docker container, run the following code.
docker stop $CONTAINER_NAME
To remove the Docker container, run the following code. If you included the --rm flag when you started the container, you don’t need this step.
docker rm $CONTAINER_NAME