Configure NVIDIA NIM for Image OCR (NeMo Retriever OCR v1)#
NVIDIA NIM for Image OCR (NeMo Retriever OCR v1) uses docker containers under the hood. Each NIM is its own Docker container and there are several ways to configure it. The remainder of this documentation describes the ways to configure a NIM container.
Use this documentation to learn how to configure NVIDIA NIM for Image OCR (NeMo Retriever OCR v1).
GPU Selection#
The NIM container is GPU-accelerated and uses NVIDIA Container Toolkit for access to GPUs on the host.
You can specify the --gpus all
command-line argument to the docker run
command if the host has one or more of the same GPU model.
If the host has a combination of GPUs, such as an A6000 and a GeForce display GPU, run the container on compute-capable GPUs only.
Expose specific GPUs to the container by using either of the following methods:
Specify the
--gpus
argument, such as--gpus="device=1"
.Set the
NVIDIA_VISIBLE_DEVICES
environment variable, such as-e NVIDIA_VISIBLE_DEVICES=1
.
Run the nvidia-smi -L
command to list the device IDs to specify in the argument or environment variable:
GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
Refer to GPU Enumeration in the NVIDIA Container Toolkit documentation for more information.
PID Limit#
In certain deployment or container runtime environments, default process and thread limits (PID limits) can interfere with NIM startup. These set limits are set by Docker, Podman, Kubernetes, or the operating system.
If the PID limit is too low, you might see symptoms such as:
NIM starts up partially, but fails to reach ready state, and then stalls.
NIM starts up partially, but fails to reach ready state, and then crashes.
NIM serves a small number of requests, and then fails.
To verify that PID limits are impacting the NIM container, you can remove or adjust the PID limit at the container, node, and operating system level. Removing the PID limit and then checking for success is a useful diagnostic step.
To increase the PID limit in a
docker run
command, set--pids-limit=-1
. For details, see docker container run.To increase the PID limit in a
podman run
command,--pids-limit=-1
. For details, see Podman pids-limit.To increase the PID limit in Kubernetes, set the PodPidsLimit on the kubelet on each node. For details, see your Kubernetes distribution specific documentation.
To increase the PID limit at the operating system level, see your OS-specific documentation.
Environment Variables#
Note
The following NIMs do not support NIM_SERVED_MODEL_NAME
:
nemoretriever-graphic-elements-v1
nemoretriever-page-elements-v2
nemoretriever-table-structure-v1
PaddleOCR
nemoretriever-ocr-v1
The following table identifies the environment variables that are used in the container.
Set environment variables with the -e
command-line argument to the docker run
command.
Name |
Description |
Default Value |
---|---|---|
|
Set this variable to the value of your personal NGC API key. |
None |
|
Specifies the fully qualified path, in the container, for downloaded models. |
|
|
Specifies the network port number, in the container, for gRPC access to the microservice. |
|
|
Specifies the network port number, in the container, for HTTP access to the microservice. Refer to Publishing ports in the Docker documentation for more information about host and container network ports. |
|
|
Specifies the number of worker threads to start for HTTP requests. |
|
|
Specifies the network port number, in the container, for NVIDIA Triton Inference Server. |
|
|
When set to |
|
|
Specifies the logging level. The microservice supports the following values: DEBUG, INFO, WARNING, ERROR, and CRITICAL. |
|
|
When set to |
|
|
Set to |
|
|
Specifies the fully qualified path, in the container, for the model manifest YAML file. |
|
|
Specifies the model profile ID to use with the container. By default, the container attempts to automatically match the host GPU model and GPU count with the optimal model profile. |
None |
|
Specifies the model names used in the API.
Specify multiple names in a comma-separated list.
If you specify multiple names, the server responds to any of the names.
The name in the model field of a response is the first name in this list.
By default, the model is inferred from the |
None |
|
If set to a non-empty string, the |
None |
|
For the NVIDIA Triton Inference Server, specify the byte size for the CUDA memory pool for all GPUs visible to the container. |
By default, Image Retriever NIMs automatically set the CUDA memory pool based on the maximum input data size for the loaded TensorRT engine. However, you might want to increase the CUDA memory pool size when you enable dynamic batching or run highly concurrent workloads. A typical error message that indicates that you should increase the CUDA memory pool is |
|
For the NVIDIA Triton Inference Server, sets the max queue delayed time to allow other requests to join the dynamic batch. For more information, refer to the Triton User Guide. |
|
|
Specifies the gRPC port number, for NVIDIA Triton Inference Server. |
|
|
When set to |
|
|
Sets the max queue size for the underlying Triton instance. For more information, refer to the Triton User Guide. Triton returns an InferenceServerException on new requests if you exceed the max queue size. |
None |
|
Specify the maximum batch size that the underlying Triton instance can process. The value must be less than or equal to maximum batch size that was used to compile the engine. By default, the NIM uses the maximum possible batch size for a given model and GPU. To decrease the memory footprint of the server, choose a smaller maximum batch size. If the model uses the |
None |
|
Controls which TensorRT engine profiles are loaded when the NIM’s Triton server starts.
Specify |
|
Volumes#
The following table identifies the paths that are used in the container. Use this information to plan the local paths to bind mount into the container.
Container Path |
Description |
Example |
---|---|---|
|
Specifies the path, relative to the root of the container, for downloaded models. The typical use for this path is to bind mount a directory on the host with this path inside the container.
For example, to use If you do not specify a bind or volume mount, as shown in the The |
|