Configuration#

Llama 3.1 Nemotron Safety Guard 8B NIM is packaged as a container. You can set environment variables and specify command-line arguments to configure the microservice.

GPU Selection#

The container is GPU-accelerated and uses NVIDIA Container Toolkit for access to GPUs on the host.

You can specify the --gpus all command-line argument to the docker run command if the host has one or more of the same GPU model. If the host has a combination of GPUs, run the container on compute-capable GPUs only.

Expose specific GPUs to the container by using either of the following methods:

  • Specify the --gpus argument, such as --gpus="device=1".

  • Set the NVIDIA_VISIBLE_DEVICES environment variable, such as -e NVIDIA_VISIBLE_DEVICES=1.

Run the nvidia-smi -L command to list the device IDs to specify in the argument or environment variable:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to GPU Enumeration in the NVIDIA Container Toolkit documentation for more information.

Environment Variables#

The following table identifies the environment variables that are used in the container. Set environment variables with the -e command-line argument to the docker run command.

Name

Description

Default Value

NGC_API_KEY

Set this variable to the value of your personal NGC API key.

None

NIM_CACHE_PATH

Specifies the fully qualified path, in the container, for downloaded models.

/opt/nim/.cache

NIM_ENABLE_KV_CACHE_REUSE

Specifies to enable key-value caching when set to True, the default value. Caching can improve performance whe more than 90% of the initial prompt is identical across multiple requests.

For more information, refer to KV Cache Reuse with NVIDIA NIM for LLMs in the NIM for LLMs documentation.

True

NIM_HTTP_API_PORT

Specifies the network port number, in the container, for HTTP access to the microservice.

Refer to Publishing ports in the Docker documentation for more information about host and container network ports.

8000

NIM_HTTP_MAX_WORKERS

Specifies the number of worker threads to start for HTTP requests.

1

NIM_IGNORE_MODEL_DOWNLOAD_FAIL

When set to true and the microservice fails to download a model from NGC, the microservice continues to run rather than exit. This environment variable can be useful in an air-gapped environment.

false

NIM_JSONL_LOGGING

When set to true, the microservice creates log records in the JSONL format.

false

NIM_LOG_LEVEL

Specifies the logging level. The microservice supports the following values: DEBUG, INFO, WARNING, ERROR, and CRITICAL.

INFO

NIM_MANIFEST_PATH

Specifies the fully qualified path, in the container, for the model manifest YAML file.

/opt/nim/etc/default/model_manifest.yaml

NIM_MODEL_PROFILE

Specifies the model profile ID to use with the container. By default, the container attempts to automatically match the host GPU model and GPU count with the optimal model profile.

None

NIM_SERVED_MODEL_NAME

Specifies the model names used in the API. Specify multiple names in a comma-separated list. If you specify multiple names, the server responds to any of the names. The name in the model field of a response matches the name in the request. By default, the model name is inferred from the /opt/nim/etc/default/model_manifest.yaml file.

For Prometheus metrics, this value is used in the model_name label. If more than one value is specified, the first one is used for the label.

None

Volumes#

The following table identifies the paths that are used in the container. Use this information to plan the local paths to bind mount into the container.

Container Path

Description

Example

/opt/nim/.cache or NIM_CACHE_PATH

Specifies the path, relative to the root of the container, for downloaded models.

The typical use for this path is to bind mount a directory on the host with this path inside the container. For example, to use ~/.cache/nim on the host, run mkdir -p ~/.cache/nim before you start the container. When you start the container, specify the -v ~/.cache/nim:/opt/nim/.cache -u $(id -u) arguments to the docker run command.

If you do not specify a bind or volume mount, as shown in the -v argument in the preceding command, the container downloads the model each time you start the container.

The -u $(id -u) argument runs the container with your user ID to avoid file system permission issues and errors.

-v ~/.cache/nim:/opt/nim/.cache -u $(id -u)