Configuring a NIM#

This page contains a complete reference on how to configure the NIM for Cosmos container.

GPU Selection#

Passing --gpus all to docker run is acceptable in homogeneous environments with one or more of the same GPU.

In heterogeneous environments with a combination of GPUs, workloads should only run on compute-capable GPUs. Expose specific GPUs inside the container using one of the following:

The --gpus flag (e.g. --gpus='"device=1"')
The environment variable CUDA_VISIBLE_DEVICES (e.g. -e CUDA_VISIBLE_DEVICES=1)

The device ID(s) to use as input(s) are listed in the output of nvidia-smi -L:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to the NVIDIA Container Toolkit documentation for more details.

Shared Memory Flag#

Passing --ipc=host to docker run is required when not using NVLink for multi-GPU setups. It is unnecessary on SXM systems or when using profiles with only 1 GPU.

Environment Variables#

Below is a reference for environment variables that can be passed into a NIM (-e added to docker run):

Name	Description	Default
NGC_API_KEY	API key used to download assets from NGC	“”
NIM_CACHE_PATH	Path in the container where models are downloaded and stored. This is the location where you may want to mount a persistent cache	/opt/nim/.cache
NIM_HTTP_API_PORT	Server port used by the Uvicorn HTTP server running the FastAPI	8000
NIM_DISABLE_MODEL_DOWNLOAD	Disable model download on container startup	0
NIM_HTTP_MAX_WORKERS	Number of workers to use for the Uvicorn process when spinning up the FastAPI web server	1
NIM_IGNORE_MODEL_DOWNLOAD_FAIL	If this value is set to 1, model download failures will be ignored so that the container keeps running. By default, failure to download the model artifacts from the model manifest will terminate the NIM container.	0
NIM_LOG_LEVEL	Logging threshold for the container image. Accepts strings representing any of the following values (decreasing severity, case-insensitive): CRITICAL, ERROR, WARNING, INFO, DEBUG.	INFO
NIM_MANIFEST_PATH	Path to the model manifest file in the NIM. If this environment variable is not set, then the override location /opt/nim/etc/model_manifest.yaml will be used if a file exists at this path. Otherwise, the default location of /opt/nim/etc/default/model_manifest.yaml will be used. If a manifest cannot be located, then a nimlib.exceptions.ModelManifestMissing exception will be raised.	“”
NIM_MODEL_PROFILE	The ID of the profile to use from the list of one or more profiles provided in the model manifest. If not set, then the value of the NIM_MANIFEST_PROFILE environment variable is used as a fallback. If NIM_MANIFEST_PROFILE is not set, then the first profile in the model manifest will be used. `latency` and `throughput` are also acceptable values to hint the profile auto-detection.	“”
NIM_SKIP_MATERIALIZE	By default, materializing a workspace to a model cache path relies on the NIM SDK to determine whether an artifact requires download and link. If this value is set to 1, download and link will always be used, instead of having the NIM SDK determine whether download and link is required.	0
NIM_TRITON_LOG_VERBOSE	Controls the verbosity of the triton backend server, if used. Acceptable values include any from the set range from 0 (disables verbose logging) to a number greater than 0. See the corresponding documentation in the Triton inference server user guide	0
NIM_TAGS_SELECTOR	Use this value to filter tags in the auto profile selector. This should be a list of key-value pairs, where the key is the profile property name and the value is the desired property value (e.g. `llm_engine=vllm,tp=1`).	“”
NIM_LOGGING_JSONL	Set to 1 to enable JSON-formatted logs. Readable text logs are enabled by default.	0
NIM_VIDEO_SAVE_QUALITY	Determines the ffmpeg compression quality when encoding the video. Accepted values are between 1 and 9. The compression quality does not affect the diffusion process itself.	5
NIM_TRITON_REQUEST_TIMEOUT	Timeout, in microseconds, before a request times out, including the queue time. The default timeout is 30 minutes.	1800000000
NIM_ALLOW_URL_INPUT	This parameter is only applicable to the `cosmos-predict1-7b-video2world` NIM. Set to 1 to allow passing URLs of the visual input to the NIM. Set to 0 to disable passing URLs to the NIM.	1

Volumes#

Below are the paths inside the container to mount local paths.

Container path

Required?

Notes

Docker argument example

/opt/nim/.cache (or NIM_CACHE_PATH if present)

This volume is not required, but if it is not mounted, the container will do a fresh download of the model each time it is brought up.

Models are downloaded to this directory inside the container. This directory must be accessed from inside the container. This can be achieved by adding the option -u $(id -u) to the docker run command.

For example, to use ~/.cache/nim as the host machine directory for caching models, first execute mkdir -p ~/.cache/nim before running the docker run ... command.

-v ~/.cache/nim:/opt/nim/.cache -u $(id -u)