Configuring the NIM#

The following environment variables can be used to configure the NIM at runtime:

ENV

Required

Default

Notes

NGC_API_KEY

Yes

Your NGC API key for model access.

NIM_HTTP_API_PORT

No

8000

The port for the HTTP API server.

USE_CUDA_IPC

No

Auto-detected

Enables CUDA IPC for inter-process communication. If not explicitly set, this value will be auto-detected.

  • 0: Disabled

  • 1: Enabled

ENABLE_CUDA_MPS

No

0

Allows CUDA to use the multi process server. On some systems and configurations, this can produce increased performance. Values

  • 0: Disabled

  • 1: Enabled

NIM_ENABLE_OTEL

No

True

Enables OpenTelemetry.

CUDA_VISIBLE_DEVICES

No

All

A comma-separated list of GPU IDs to use

NIM_CACHE_PATH

No

/opt/nim/.cache

The default location in the NIM to use for caching.

NIM_LOG_LEVEL

No

DEFAULT

Controls NIM logging verbosity. Supported levels are [‘DEFAULT’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’, ‘TRACE’]

NIM_TRITON_LOG_VERBOSE

No

0

Controls logging verbosity of the Triton server component. Supported values are [0,1,2,3,4,5], with 0 being the least verbose and 5 being the most verbose option.

Additional Runtime Variables:

  • –tmpfs /tmp/ram,rw,size=2g: Can be used to configure /tmp/ram to use host memory space for potential speed up on some systems.

Host caching via LOCAL_NIM_CACHE#

To persist model downloads across container restarts and align with other NIMs, bind mount a host cache directory using LOCAL_NIM_CACHE (see Quickstart for example commands). The directory you mount on the host becomes the in-container /opt/nim/.cache path, which is also the value of NIM_CACHE_PATH.

Notes on input resolution and variants#

The current release exposes the model as cosmos-embed1 and bundles the 224p variant, which produces 256‑dimensional embeddings. Input videos using supported codecs and sizes are automatically resized by the NIM. Variant selection is not configurable in this release.