Configuring the NIM#

The following environment variables can be used to configure the NIM at runtime:

ENV	Required	Default	Notes
NGC_API_KEY	Yes	—	Your NGC API key for model access.
NIM_HTTP_API_PORT	No	8000	The port for the HTTP API server.
USE_CUDA_IPC	No	Auto-detected	Enables CUDA IPC for inter-process communication. If not explicitly set, this value will be auto-detected. 0: Disabled 1: Enabled
ENABLE_CUDA_MPS	No	0	Allows CUDA to use the multi process server. On some systems and configurations, this can produce increased performance. Values 0: Disabled 1: Enabled
NIM_ENABLE_OTEL	No	True	Enables OpenTelemetry.
CUDA_VISIBLE_DEVICES	No	All	A comma-separated list of GPU IDs to use
NIM_CACHE_PATH	No	/opt/nim/.cache	The default location in the NIM to use for caching.
NIM_LOG_LEVEL	No	DEFAULT	Controls NIM logging verbosity. Supported levels are [‘DEFAULT’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’, ‘TRACE’]
NIM_TRITON_LOG_VERBOSE	No	0	Controls logging verbosity of the Triton server component. Supported values are [0,1,2,3,4,5], with 0 being the least verbose and 5 being the most verbose option.

Additional Runtime Variables:

–tmpfs /tmp/ram,rw,size=2g: Can be used to configure /tmp/ram to use host memory space for potential speed up on some systems.

Host caching via LOCAL_NIM_CACHE#

To persist model downloads across container restarts and align with other NIMs, bind mount a host cache directory using LOCAL_NIM_CACHE (see Quickstart for example commands). The directory you mount on the host becomes the in-container /opt/nim/.cache path, which is also the value of NIM_CACHE_PATH.

Notes on input resolution and variants#

The current release exposes the model as cosmos-embed1 and bundles the 224p variant, which produces 256‑dimensional embeddings. Input videos using supported codecs and sizes are automatically resized by the NIM. Variant selection is not configurable in this release.