Configuring a NIM#

This page contains a complete reference on how to configure the NIM for Cosmos container.

GPU Selection#

Passing --gpus all to docker run is acceptable in homogeneous environments with one or more of the same GPU.

In heterogeneous environments with a combination of GPUs, workloads should only run on compute-capable GPUs. Expose specific GPUs inside the container using one of the following:

  • The --gpus flag (e.g. --gpus='"device=1"')

  • The environment variable CUDA_VISIBLE_DEVICES (e.g. -e CUDA_VISIBLE_DEVICES=1)

The device ID(s) to use as input(s) are listed in the output of nvidia-smi -L:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to the NVIDIA Container Toolkit documentation for more details.

Shared Memory Flag#

Passing --ipc=host to docker run is required when not using NVLink for multi-GPU setups. It is unnecessary on SXM systems or when using profiles with only 1 GPU.

Environment Variables#

Below is a reference for environment variables that can be passed into a NIM (-e added to docker run):

Name

Description

Default

NGC_API_KEY

API key used to download assets from NGC

“”

NIM_CACHE_PATH

Path in the container where models are downloaded and stored. This is the location where you may want to mount a persistent cache

/opt/nim/.cache

NIM_HTTP_API_PORT

Server port used by the Uvicorn HTTP server running the FastAPI

8000

NIM_DISABLE_MODEL_DOWNLOAD

Disable model download on container startup

0

NIM_HTTP_MAX_WORKERS

Number of workers to use for the Uvicorn process when spinning up the FastAPI web server

1

NIM_IGNORE_MODEL_DOWNLOAD_FAIL

If this value is set to 1, model download failures will be ignored so that the container keeps running. By default, failure to download the model artifacts from the model manifest will terminate the NIM container.

0

NIM_LOG_LEVEL

Logging threshold for the container image. Accepts strings representing any of the following values (decreasing severity, case-insensitive): CRITICAL, ERROR, WARNING, INFO, DEBUG.

INFO

NIM_MANIFEST_PATH

Path to the model manifest file in the NIM. If this environment variable is not set, then the override location /opt/nim/etc/model_manifest.yaml will be used if a file exists at this path. Otherwise, the default location of /opt/nim/etc/default/model_manifest.yaml will be used. If a manifest cannot be located, then a nimlib.exceptions.ModelManifestMissing exception will be raised.

“”

NIM_MODEL_PROFILE

The ID of the profile to use from the list of one or more profiles provided in the model manifest. If not set, then the value of the NIM_MANIFEST_PROFILE environment variable is used as a fallback. If NIM_MANIFEST_PROFILE is not set, then the first profile in the model manifest will be used. latency and throughput are also acceptable values to hint the profile auto-detection.

“”

NIM_SKIP_MATERIALIZE

By default, materializing a workspace to a model cache path relies on the NIM SDK to determine whether an artifact requires download and link. If this value is set to 1, download and link will always be used, instead of having the NIM SDK determine whether download and link is required.

0

NIM_TRITON_LOG_VERBOSE

Controls the verbosity of the triton backend server, if used. Acceptable values include any from the set range from 0 (disables verbose logging) to a number greater than 0. See the corresponding documentation in the Triton inference server user guide

0

NIM_TAGS_SELECTOR

Use this value to filter tags in the auto profile selector. This should be a list of key-value pairs, where the key is the profile property name and the value is the desired property value (e.g. llm_engine=vllm,tp=1).

“”

NIM_LOGGING_JSONL

Set to 1 to enable JSON-formatted logs. Readable text logs are enabled by default.

0

NIM_VIDEO_SAVE_QUALITY

Determines the ffmpeg compression quality when encoding the video. Accepted values are between 1 and 9. The compression quality does not affect the diffusion process itself.

5

NIM_TRITON_REQUEST_TIMEOUT

Timeout, in microseconds, before a request times out, including the queue time. The default timeout is 30 minutes.

1800000000

NIM_ALLOW_URL_INPUT

This parameter is only applicable to the cosmos-predict1-7b-video2world NIM. Set to 1 to allow passing URLs of the visual input to the NIM. Set to 0 to disable passing URLs to the NIM.

1

Volumes#

Below are the paths inside the container to mount local paths.

Container path

Required?

Notes

Docker argument example

/opt/nim/.cache (or NIM_CACHE_PATH if present)

This volume is not required, but if it is not mounted, the container will do a fresh download of the model each time it is brought up.

Models are downloaded to this directory inside the container. This directory must be accessed from inside the container. This can be achieved by adding the option -u $(id -u) to the docker run command.

For example, to use ~/.cache/nim as the host machine directory for caching models, first execute mkdir -p ~/.cache/nim before running the docker run ... command.

-v ~/.cache/nim:/opt/nim/.cache -u $(id -u)