Configure Your NIM#
NVIDIA NIM for VLMs runs in Docker containers. Each NIM has its own container and several configuration options. This page is a reference for configuring a NIM container.
GPU Selection#
Passing --gpus all to docker run is acceptable in homogeneous environments with one or more of the same GPU.
In heterogeneous environments with a combination of GPUs (A6000 + a GeForce display GPU), workloads should only run on compute-capable GPUs. Expose specific GPUs inside the container using either:
The
--gpusflag (for example,--gpus='"device=1"')The environment variable
CUDA_VISIBLE_DEVICES(for example,-e CUDA_VISIBLE_DEVICES=1)
The device ID(s) to use as input(s) are listed in the output of nvidia-smi -L:
GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
Refer to the NVIDIA Container Toolkit documentation for more instructions.
How Many GPUs Do I Need?#
Each profile will have a TP (Tensor Parallelism) and PP (Pipeline Parallelism), decipherable through their readable name (for example, tensorrt_llm-trtllm_buildable-bf16-tp8-pp2).
In most cases, you will need TP * PP amount of GPUs to run a specific profile.
For example, for the profile tensorrt_llm-trtllm_buildable-bf16-tp8-pp2 you will need either 2 nodes with 8 GPUs or 2 * 8 = 16 GPUs on one Node.
Environment Variables#
Environment variables can be passed into a NIM (-e with docker run).
Volumes#
Below are the paths inside the container to mount local paths.
Container path |
Required? |
Notes |
Docker argument example |
|---|---|---|---|
|
Not required, but if this volume is not mounted, the container will do a fresh download of the model each time it is brought up. |
This is the directory within
which models are downloaded
inside the container. It is
very important that this
directory could be accessed
from inside the container.
This can be achieved by
adding the option For example, to use
|
|