Configuration#

This section describes the various ways to configure a NIM container.

GPU Selection#

Passing --gpus all to docker run is acceptable in homogeneous environments with one or more of the same GPU.

In heterogeneous environments with a combination of GPUs, such as an A6000 + a GeForce display GPU, workloads should only run on compute-capable GPUs. Expose specific GPUs inside the container using either:

the --gpus flag (ex: --gpus="device=1")
the environment variable NVIDIA_VISIBLE_DEVICES (ex: -e NVIDIA_VISIBLE_DEVICES=1)

The device ID(s) to use as input(s) are listed in the output of nvidia-smi -L:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to the NVIDIA Container Toolkit documentation for more instructions.

Shared memory flag#

Tokenization uses Triton’s Python backend capabilities that scales with the number of CPU cores available. You may need to increase the available shared memory given to the microservice container.

Example providing 1g of shared memory:

docker run ... --shm-size=1g ...

Environment Variables#

The following table describes the environment variables that can be passed into a NIM, as a -e argument added to a docker run command:

ENV	Required?	Default	Notes
`NGC_API_KEY`	Yes	None	You must set this variable to the value of your personal NGC API key.
`NIM_CACHE_PATH`	No	`/opt/nim/.cache`	Location (in container) where the container caches model artifacts.
`NIM_HTTP_TRITON_PORT`	No	`8080`	Set the HTTP API port for the triton backend server. Triton defaults to port 8000, which is also the default used by the NIM’s FastAPI frontend server.logging.
`NIM_LOG_LEVEL`	No	`INFO`	Logging threshold for the container image. Accepts strings representing any of the following values (decreasing severity, case-insensitive): CRITICAL, ERROR, WARNING, WARN, INFO, DEBUG. See Python logging for more details.
`NIM_HTTP_API_PORT`	No	`8000`	Publish the NIM HTTP API port to the prescribed port inside the container. Make sure to adjust the port passed to the `-p/--publish` flag of docker run to reflect that (ex: `-p YOU_DESIRED_API_PORT:$NIM_HTTP_API_PORT`). The left-hand side of this `:` is your host address:port, and does NOT have to match with `$NIM_HTTP_API_PORT`. The right-hand side of the `:` is the port inside the container which MUST match `NIM_HTTP_API_PORT` (or `8000` if not set).

Volumes#

The following table describes the paths inside the container into which local paths can be mounted.

Container path	Required	Notes	Docker argument example
`/opt/nim/.cache` (or `NIM_CACHE_PATH` if present)	Not required, but if this volume is not mounted, the container will do a fresh download of the model each time it is brought up.	This is the directory within which models are downloaded inside the container. It is very important that this directory could be accessed from inside the container. This can be achieved by adding the option `-u $(id -u)` to the `docker run` command. For example, to use `~/.cache/nim` as the host machine directory for caching models, first do `mkdir -p ~/.cache/nim` before running the `docker run ...` command.	`-v ~/.cache/nim:/opt/nim/.cache -u $(id -u)`