Configure Your NIM#

NVIDIA NIM for VLMs runs in Docker containers. Each NIM has its own container and several configuration options. This page is a reference for configuring a NIM container.

GPU Selection#

Passing --gpus all to docker run is acceptable in homogeneous environments with one or more of the same GPU.

In heterogeneous environments with a combination of GPUs (A6000 + a GeForce display GPU), workloads should only run on compute-capable GPUs. Expose specific GPUs inside the container using either:

  • The --gpus flag (for example, --gpus='"device=1"')

  • The environment variable CUDA_VISIBLE_DEVICES (for example, -e CUDA_VISIBLE_DEVICES=1)

The device ID(s) to use as input(s) are listed in the output of nvidia-smi -L:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to the NVIDIA Container Toolkit documentation for more instructions.

How Many GPUs Do I Need?#

Each profile will have a TP (Tensor Parallelism) and PP (Pipeline Parallelism), decipherable through their readable name (for example, tensorrt_llm-trtllm_buildable-bf16-tp8-pp2).

In most cases, you will need TP * PP amount of GPUs to run a specific profile.

For example, for the profile tensorrt_llm-trtllm_buildable-bf16-tp8-pp2 you will need either 2 nodes with 8 GPUs or 2 * 8 = 16 GPUs on one Node.

Shared Memory Flag#

Pass --shm-size=32GB to docker run. Not required for single GPU models or GPUs with NVLink enabled.

Environment Variables#

Environment variables can be passed into a NIM (-e with docker run).

Volumes#

Below are the paths inside the container to mount local paths.

Container path

Required?

Notes

Docker argument example

/opt/nim/.cache (or NIM_CACHE_PATH if present)

Not required, but if this volume is not mounted, the container will do a fresh download of the model each time it is brought up.

This is the directory within which models are downloaded inside the container. It is very important that this directory could be accessed from inside the container. This can be achieved by adding the option -u $(id -u) to the docker run command.

For example, to use ~/.cache/nim as the host machine directory for caching models, first do mkdir -p ~/.cache/nim before running the docker run ... command.

-v ~/.cache/nim:/opt/nim/.cache -u $(id -u).