Environment Variables for NVIDIA NIM for Image OCR (NeMo Retriever OCR v1)#

Use this documentation to learn about the environment variables for NVIDIA NIM for Image OCR (NeMo Retriever OCR v1).

Environment Variables#

Note

The following NIMs do not support NIM_SERVED_MODEL_NAME:

  • nemoretriever-graphic-elements-v1

  • nemoretriever-page-elements-v2

  • nemoretriever-table-structure-v1

  • PaddleOCR

  • nemoretriever-ocr-v1

The following table identifies the environment variables that are used in the container. Set environment variables with the -e command-line argument to the docker run command.

Name

Description

Default Value

NGC_API_KEY

Set this variable to the value of your personal NGC API key.

None

NIM_CACHE_PATH

Specifies the fully qualified path, in the container, for downloaded models.

/opt/nim/.cache

NIM_GRPC_API_PORT

Specifies the network port number, in the container, for gRPC access to the microservice.

50051

NIM_HTTP_API_PORT

Specifies the network port number, in the container, for HTTP access to the microservice.

Refer to Publishing ports in the Docker documentation for more information about host and container network ports.

8000

NIM_HTTP_MAX_WORKERS

Specifies the number of worker threads to start for HTTP requests.

1

NIM_HTTP_TRITON_PORT

Specifies the network port number, in the container, for NVIDIA Triton Inference Server.

8080

NIM_IGNORE_MODEL_DOWNLOAD_FAIL

When set to true and the microservice fails to download a model from NGC, the microservice continues to run rather than exit. This environment variable can be useful in an air-gapped environment.

false

NIM_LOG_LEVEL

Specifies the logging level. The microservice supports the following values: DEBUG, INFO, WARNING, ERROR, and CRITICAL.

INFO

NIM_LOGGING_JSONL

When set to true, the microservice creates log records in the JSONL format.

false

NIM_MANIFEST_ALLOW_UNSAFE

Set to 1 to enable selection of a model profile that is not included in the original model_manifest.yaml or a profile that is not detected to be compatible with the deployed hardware.

0

NIM_MANIFEST_PATH

Specifies the fully qualified path, in the container, for the model manifest YAML file.

/opt/nim/etc/default/model_manifest.yaml

NIM_MODEL_PROFILE

Specifies the model profile ID to use with the container. By default, the container attempts to automatically match the host GPU model and GPU count with the optimal model profile.

None

NIM_SERVED_MODEL_NAME

Specifies the model names used in the API. Specify multiple names in a comma-separated list. If you specify multiple names, the server responds to any of the names. The name in the model field of a response is the first name in this list. By default, the model is inferred from the model_manifest.yaml.

None

NIM_REPOSITORY_OVERRIDE

If set to a non-empty string, the NIM_REPOSITORY_OVERRIDE value replaces the hard-coded location of the repository and the protocol for access to the repository. The structure of the value for this environment variable is as follows: <repository type>://<repository location>. Only the protocols ngc://, s3://, and https:// are supported, and only the first component of the URI is replaced. For example:
- If the URI in the manifest is ngc://org/meta/llama3-8b-instruct:hf?file=config.json and NIM_REPOSITORY_OVERRIDE=ngc://myrepo.ai/, the domain name for the API endpoint is set to myrepo.ai.
- If NIM_REPOSITORY_OVERRIDE=s3://mybucket/, the result of the replacement will be s3://mybucket/nim%2Fmeta%2Fllama3-8b-instruct%3Ahf%3Ffile%3Dconfig.json.
- If NIM_REPOSITORY_OVERRIDE=https://mymodel.ai/some_path, the result of the replacement will be https://mymodel.ai/some_path/nim%2Fmeta%2Fllama3-8b-instruct%3Ahf%3Ffile%3Dconfig.json.

This repository override feature supports basic authentication mechanisms:
- https assumes authorization using the Authorization header and the credential value in NIM_HTTPS_CREDENTIAL.
- ngc requires a credential in the NGC_API_KEY environment variable.
- s3 requires the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and (if using temporary credentials) AWS_SESSION_TOKEN.

None

NIM_TRITON_CUDA_MEMORY_POOL_MB

For the NVIDIA Triton Inference Server, specify the byte size for the CUDA memory pool for all GPUs visible to the container.

By default, Image Retriever NIMs automatically set the CUDA memory pool based on the maximum input data size for the loaded TensorRT engine. However, you might want to increase the CUDA memory pool size when you enable dynamic batching or run highly concurrent workloads. A typical error message that indicates that you should increase the CUDA memory pool is RuntimeError: CUDA error: invalid argument. When you run NIMs with NIM_TRITON_EXTRA_ARGS, any arguments to the underlying tritonserver command are overridden. If you specify NIM_TRITON_EXTRA_ARGS, you must include the options that are automatically configured for the underlying tritonserver command. For an up-to-date reference for what these arguments are, run the NIM without NIM_TRITON_EXTRA_ARGS, inspect the arguments used for the tritonserver process inside the container, and then run the NIM with NIM_TRITON_EXTRA_ARGS and include the extra arguments.

NIM_TRITON_DYNAMIC_BATCHING_MAX_QUEUE_DELAY_MICROSECONDS

For the NVIDIA Triton Inference Server, sets the max queue delayed time to allow other requests to join the dynamic batch. For more information, refer to the Triton User Guide.

100us (microseconds)

NIM_TRITON_GRPC_PORT

Specifies the gRPC port number, for NVIDIA Triton Inference Server.

8001

NIM_TRITON_LOG_VERBOSE

When set to 1, the container starts NVIDIA Triton Inference Server with verbose logging.

0

NIM_TRITON_MAX_QUEUE_SIZE

Sets the max queue size for the underlying Triton instance. For more information, refer to the Triton User Guide. Triton returns an InferenceServerException on new requests if you exceed the max queue size.

None

NIM_TRITON_MAX_BATCH_SIZE

Specify the maximum batch size that the underlying Triton instance can process. The value must be less than or equal to maximum batch size that was used to compile the engine. By default, the NIM uses the maximum possible batch size for a given model and GPU. To decrease the memory footprint of the server, choose a smaller maximum batch size. If the model uses the tensorrt backend, the value must exactly match a batch size in one of the engine’s profiles. Only discrete values are supported. For valid values, and their estimated memory footprint, refer to the support matrix page.

None

NIM_TRITON_OPTIMIZATION_MODE

Controls which TensorRT engine profiles are loaded when the NIM’s Triton server starts. Specify default to load one profile that spans the full range of supported batch sizes. Specify perf_opt to load all profiles except for the default profile. Specify vram_opt to load only the first (smallest) profile.

default