Configure NeMo Retriever Text Reranking NIM#
NeMo Text Retriever NIM use docker containers under the hood. Each NIM is its own Docker container and there are several ways to configure it. The remainder of this section describes the various ways to configure a NIM container.
Use this documentation to learn how to configure NeMo Retriever Text Reranking NIM.
GPU Selection#
The NIM container is GPU-accelerated and uses NVIDIA Container Toolkit for access to GPUs on the host.
You can specify the --gpus all
command-line argument to the docker run
command if the host has one or more of the same GPU model.
If the host has a combination of GPUs, such as an A6000 and a GeForce display GPU, run the container on compute-capable GPUs only.
Expose specific GPUs to the container by using either of the following methods:
Specify the
--gpus
argument, such as--gpus="device=1"
.Set the
NVIDIA_VISIBLE_DEVICES
environment variable, such as-e NVIDIA_VISIBLE_DEVICES=1
.
Run the nvidia-smi -L
command to list the device IDs to specify in the argument or environment variable:
GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
Refer to GPU Enumeration in the NVIDIA Container Toolkit documentation for more information.
PID Limit#
In certain deployment or container runtime environments, default process and thread limits (PID limits) can interfere with NIM startup. These set limits are set by Docker, Podman, Kubernetes, or the operating system.
If the PID limit is too low, you might see symptoms such as:
NIM starts up partially, but fails to reach ready state, and then stalls.
NIM starts up partially, but fails to reach ready state, and then crashes.
NIM serves a small number of requests, and then fails.
To verify that PID limits are impacting the NIM container, you can remove or adjust the PID limit at the container, node, and operating system level. Removing the PID limit and then checking for success is a useful diagnostic step.
To increase the PID limit in a
docker run
command, set--pids-limit=-1
. For details, see docker container run.To increase the PID limit in a
podman run
command,--pids-limit=-1
. For details, see Podman pids-limit.To increase the PID limit in Kubernetes, set the PodPidsLimit on the kubelet on each node. For details, see your Kubernetes distribution specific documentation.
To increase the PID limit at the operating system level, see your OS-specific documentation.
Memory Footprint#
You can configure the memory footprint of the NeMo Retriever Text Reranking NIM by adjusting the model’s maximum allowed batch size and sequence length.
For ONNX model profiles, memory is allocated dynamically according to the requests. A maximum batch size and sequence length limit memory usage. You can specify a value for maximum batch size and sequence length from 1 up to the maximum supported limit for a given model and GPU.
For TensorRT model profiles, memory is allocated statically based on the optimized static inference graph which has a defined maximum input shape. You must specify a value from a discrete set of options. Refer to the support matrix for the valid values and corresponding approximate memory footprint.
By default, the NIM uses the largest possible value (given the model and GPU constraints) for both the maximum batch size and the maximum sequence length. If you specify only one of these parameters, the NIM uses the largest possible value for the unspecified parameter. For example, if you only specify a limit for maximum batch size, the NIM uses the largest possible sequence length.
Volumes#
The following table identifies the paths that are used in the container. Use this information to plan the local paths to bind mount into the container.
Container Path |
Description |
Example |
---|---|---|
|
Specifies the path, relative to the root of the container, for downloaded models. The typical use for this path is to bind mount a directory on the host with this path inside the container.
For example, to use If you do not specify a bind or volume mount, as shown in the The |
|