Configuration
NeMo Text Retriever NIM use docker containers under the hood. Each NIM is its own Docker container and there are several ways to configure it. The remainder of this section describes the various ways to configure a NIM container.
The NIM container is GPU-accelerated and uses NVIDIA Container Toolkit for access to GPUs on the host.
You can specify the --gpus all
command-line argument to the docker run
command if the host has one or more of the same GPU model.
If the host has a combination of GPUs, such as an A6000 and a GeForce display GPU, run the container on compute-capable GPUs only.
Expose specific GPUs to the container by using either of the following methods:
Specify the
--gpus
argument, such as--gpus="device=1"
.Set the
NVIDIA_VISIBLE_DEVICES
environment variable, such as-e NVIDIA_VISIBLE_DEVICES=1
.
Run the nvidia-smi -L
command to list the device IDs to specify in the argument or environment variable:
GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
Refer to GPU Enumeration in the NVIDIA Container Toolkit documentation for more information.
Tokenization uses Triton’s Python backend capabilities that scales with the number of CPU cores available. You may need to increase the available shared memory given to the microservice container.
Example providing 1g
of shared memory:
docker run ... --shm-size=1g ...
The following table identifies the environment variables that are used in the container.
Set environment variables with the -e
command-line argument to the docker run
command.
Name |
Description |
Default Value |
---|---|---|
NGC_API_KEY |
Set this variable to the value of your personal NGC API key. | None |
NIM_CACHE_PATH |
Specifies the fully qualified path, in the container, for downloaded models. | /opt/nim/.cache |
NIM_GRPC_API_PORT |
Specifies the network port number, in the container, for gRPC access to the microservice. | 50051 |
NIM_HTTP_API_PORT |
Specifies the network port number, in the container, for HTTP access to the microservice. Refer to Publishing ports in the Docker documentation for more information about host and container network ports. | ${nim_http_api_port} |
NIM_HTTP_MAX_WORKERS |
Specifies the number of worker threads to start for HTTP requests. | 1 |
NIM_HTTP_TRITON_PORT |
Specifies the network port number, in the container, for NVIDIA Triton Inference Server. | 8080 |
NIM_IGNORE_MODEL_DOWNLOAD_FAIL |
When set to true and the microservice fails to download a model from NGC, the microservice continues to run rather than exit.
This environment variable can be useful in an air-gapped environment. |
false |
NIM_LOGGING_JSONL |
When set to true , the microservice creates log records in the JSONL format. |
false |
NIM_LOG_LEVEL |
Specifies the logging level. The microservice supports the following values: DEBUG, INFO, WARNING, ERROR, and CRITICAL. | INFO |
NIM_MANIFEST_PATH |
Specifies the fully qualified path, in the container, for the model manifest YAML file. | /opt/nim/etc/default/model_manifest.yaml |
NIM_MODEL_PROFILE |
Specifies the model profile ID to use with the container. By default, the container attempts to automatically match the host GPU model and GPU count with the optimal model profile. | None |
NIM_TRITON_LOG_VERBOSE |
When set to 1 , the container starts NVIDIA Triton Inference Server with verbose logging. |
0 |
NIM_TRITON_REQUEST_TIMEOUT |
Specifies the timeout, in microseconds, for NVIDIA Triton Inference Server.
The default value, 0 , indicates no timeout. |
0 |
NIM_TRITON_GRPC_PORT |
Specifies the gRPC port number, for NVIDIA Triton Inference Server. | 8001 |
The following table identifies the paths that are used in the container. Use this information to plan the local paths to bind mount into the container.
Container Path |
Description |
Example |
---|---|---|
/opt/nim/.cache or NIM_CACHE_PATH |
Specifies the path, relative to the root of the container, for downloaded models.
The typical use for this path is to bind mount a directory on the host with this path inside the container.
For example, to use ~/.cache/nim on the host, run mkdir -p ~/.cache/nim before you start the container.
When you start the container, specify the -v ~/.cache/nim:/opt/nim/.cache -u $(id -u) arguments to the docker run command.
If you do not specify a bind or volume mount, as shown in the -v argument in the preceding command, the container downloads the model each time you start the container.
The -u $(id -u) argument runs the container with your user ID to avoid file system permission issues and errors.
|
-v ~/.cache/nim:/opt/nim/.cache -u $(id -u) |