Configure NVIDIA NIM for Image OCR#

NVIDIA NIM for Image OCR uses docker containers under the hood. Each NIM is its own Docker container and there are several ways to configure it. The remainder of this documentation describes the ways to configure a NIM container.

Use this documentation to learn how to configure NVIDIA NIM for Image OCR.

GPU Selection#

The NIM container is GPU-accelerated and uses NVIDIA Container Toolkit for access to GPUs on the host.

You can specify the --gpus all command-line argument to the docker run command if the host has one or more of the same GPU model. If the host has a combination of GPUs, such as an A6000 and a GeForce display GPU, run the container on compute-capable GPUs only.

Expose specific GPUs to the container by using either of the following methods:

  • Specify the --gpus argument, such as --gpus="device=1".

  • Set the NVIDIA_VISIBLE_DEVICES environment variable, such as -e NVIDIA_VISIBLE_DEVICES=1.

Run the nvidia-smi -L command to list the device IDs to specify in the argument or environment variable:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to GPU Enumeration in the NVIDIA Container Toolkit documentation for more information.

Shared Memory flag#

Tokenization uses Triton’s Python backend capabilities that scales with the number of CPU cores available. You may need to increase the available shared memory given to the microservice container.

Example providing 1g of shared memory:

docker run ... --shm-size=1g ...

PID Limit#

In certain deployment or container runtime environments, default process and thread limits (PID limits) can interfere with NIM startup. These set limits are set by Docker, Podman, Kubernetes, or the operating system.

If the PID limit is too low, you might see symptoms such as:

  • NIM starts up partially, but fails to reach ready state, and then stalls.

  • NIM starts up partially, but fails to reach ready state, and then crashes.

  • NIM serves a small number of requests, and then fails.

To verify that PID limits are impacting the NIM container, you can remove or adjust the PID limit at the container, node, and operating system level. Removing the PID limit and then checking for success is a useful diagnostic step.

  • To increase the PID limit in a docker run command, set --pids-limit=-1. For details, see docker container run.

  • To increase the PID limit in a podman run command, --pids-limit=-1. For details, see Podman pids-limit.

  • To increase the PID limit in Kubernetes, set the PodPidsLimit on the kubelet on each node. For details, see your Kubernetes distribution specific documentation.

  • To increase the PID limit at the operating system level, see your OS-specific documentation.

Optimization Mode#

The NVIDIA NIM for Image OCR can run in modes optimized for VRAM usage or performance when using a TensorRT model profile. You control the optimization mode by setting the NIM_TRITON_OPTIMIZATION_MODE environment variable to one of: default, perf_opt, or vram_opt.

  • default — The NIM loads one TensorRT engine profile that spans the full range of supported batch sizes. When you run in this mode, the NIM has relatively low VRAM usage, however, there is a reduced latency and throughput for small batch sizes, such as 1, 2, 3, 4.

  • perf_opt — The NIM loads all TensorRT engine profiles except for the default profile. This mode enables coverage of full supported batch sizes. When you run in this mode, the NIM has improved latency and throughput for small batch sizes, such as 1, 2, 3, 4. However, VRAM usage is not ideal because multiple profiles are loaded.

  • vram_opt — The NIM loads only the first and smallest TensorRT engine profile. This mode has the smallest VRAM usage by the NIM, but constrains the batch sizes to 1 only. This has the same effect as setting NIM_TRITON_MAX_BATCH_SIZE to 1 and NIM_TRITON_OPTIMIZATION_MODE to perf_opt.

When you specify both NIM_TRITON_OPTIMIZATION_MODE and NIM_TRITON_MAX_BATCH_SIZE the following occurs:

  • default — Higher NIM_TRITON_MAX_BATCH_SIZE results in higher VRAM usage.

  • perf_opt — Profiles larger than NIM_TRITON_MAX_BATCH_SIZE are not be used.

  • vram_optNIM_TRITON_MAX_BATCH_SIZE is ignored.

Environment Variables#

Note

PaddleOCR NIM does not support NIM_SERVED_MODEL_NAME.

The following table identifies the environment variables that are used in the container. Set environment variables with the -e command-line argument to the docker run command.

Name

Description

Default Value

NGC_API_KEY

Set this variable to the value of your personal NGC API key.

None

NIM_CACHE_PATH

Specifies the fully qualified path, in the container, for downloaded models.

/opt/nim/.cache

NIM_GRPC_API_PORT

Specifies the network port number, in the container, for gRPC access to the microservice.

50051

NIM_HTTP_API_PORT

Specifies the network port number, in the container, for HTTP access to the microservice.

Refer to Publishing ports in the Docker documentation for more information about host and container network ports.

8000

NIM_HTTP_MAX_WORKERS

Specifies the number of worker threads to start for HTTP requests.

1

NIM_HTTP_TRITON_PORT

Specifies the network port number, in the container, for NVIDIA Triton Inference Server.

8080

NIM_IGNORE_MODEL_DOWNLOAD_FAIL

When set to true and the microservice fails to download a model from NGC, the microservice continues to run rather than exit. This environment variable can be useful in an air-gapped environment.

false

NIM_LOGGING_JSONL

When set to true, the microservice creates log records in the JSONL format.

false

NIM_LOG_LEVEL

Specifies the logging level. The microservice supports the following values: DEBUG, INFO, WARNING, ERROR, and CRITICAL.

INFO

NIM_MANIFEST_ALLOW_UNSAFE

Set to 1 to enable selection of a model profile that is not included in the original model_manifest.yaml or a profile that is not detected to be compatible with the deployed hardware.

0

NIM_MANIFEST_PATH

Specifies the fully qualified path, in the container, for the model manifest YAML file.

/opt/nim/etc/default/model_manifest.yaml

NIM_MODEL_PROFILE

Specifies the model profile ID to use with the container. By default, the container attempts to automatically match the host GPU model and GPU count with the optimal model profile.

None

NIM_SERVED_MODEL_NAME

Specifies the model names used in the API. Specify multiple names in a comma-separated list. If you specify multiple names, the server responds to any of the names. The name in the model field of a response is the first name in this list. By default, the model is inferred from the model_manifest.yaml.

None

NIM_TRITON_DYNAMIC_BATCHING_MAX_QUEUE_DELAY_MICROSECONDS

For the NVIDIA Triton Inference Server, sets the max queue delayed time to allow other requests to join the dynamic batch. For more information, refer to the Triton User Guide.

100us (microseconds)

NIM_TRITON_LOG_VERBOSE

When set to 1, the container starts NVIDIA Triton Inference Server with verbose logging.

0

NIM_TRITON_MAX_QUEUE_SIZE

Sets the max queue size for the underlying Triton instance. For more information, refer to the Triton User Guide. Triton returns an InferenceServerException on new requests if you exceed the max queue size.

None

NIM_TRITON_MAX_BATCH_SIZE

Specify the batch size for the underlying Triton instance. The value must be less than or equal to maximum batch size that was used to compile the engine. If the model uses the tensorrt backend, the value must exactly match a batch size in one of the engine’s profiles.

None

NIM_TRITON_OPTIMIZATION_MODE

Controls which TensorRT engine profiles are loaded when the NIM’s Triton server starts. Specify default to load one profile that spans the full range of supported batch sizes. Specify perf_opt to load all profiles except for the default profile. Specify vram_opt to load only the first (smallest) profile.

default

NIM_TRITON_GRPC_PORT

Specifies the gRPC port number, for NVIDIA Triton Inference Server.

8001

Volumes#

The following table identifies the paths that are used in the container. Use this information to plan the local paths to bind mount into the container.

Container Path

Description

Example

/opt/nim/.cache or NIM_CACHE_PATH

Specifies the path, relative to the root of the container, for downloaded models.

The typical use for this path is to bind mount a directory on the host with this path inside the container. For example, to use ~/.cache/nim on the host, run mkdir -p ~/.cache/nim before you start the container. When you start the container, specify the -v ~/.cache/nim:/opt/nim/.cache -u $(id -u) arguments to the docker run command.

If you do not specify a bind or volume mount, as shown in the -v argument in the preceding command, the container downloads the model each time you start the container.

The -u $(id -u) argument runs the container with your user ID to avoid file system permission issues and errors.

-v ~/.cache/nim:/opt/nim/.cache -u $(id -u)