Environment Variables for NVIDIA NeMo Retriever Embedding NIM#

Use this documentation to learn about the environment variables for NVIDIA NeMo Retriever Embedding NIM.

Binary Environment Variables#

The following table contains the binary environment variables.

Name

Default

Description

LOG_FORMAT

Pretty

The log emit format. One of: pretty, json, compact.

RUST_LOG

info

The tracing EnvFilter.

SHOW_CONFIG

false

True to print env-var help and exit.

Server Environment Variables#

The following table contains the server environment variables.

Name

Default

Description

NIM_BIND_ADDR

0.0.0.0:8000

The HTTP listen address (host:port).

NIM_GRPC_BIND_ADDR

-

The optional KServe V2 gRPC listen address (host:port). Empty disables gRPC.

NIM_GRPC_MAX_DECODING_MESSAGE_BYTES

-

The maximum inbound KServe gRPC message size in bytes. Empty uses the effective HTTP body limit.

NIM_HTTP_BODY_LIMIT_BYTES

-

The HTTP request body limit for /v1/embeddings in bytes (passthrough).

NIM_MAX_IMAGE_BYTES

-

The maximum allowed bytes for an embedded image payload in a single request.

NIM_MAX_QUEUE_SIZE

1024

The batcher request queue depth.

NIM_MAX_WAIT_MS

1

The maximum milliseconds to wait for additional requests before dispatching a batch.

NIM_REQUEST_TIMEOUT_S

120

The request timeout in seconds.

NIM_TLS_CERT_PATH

-

The path to PEM certificate chain for HTTPS. When set with NIM_TLS_KEY_PATH, enables TLS.

NIM_TLS_KEY_PATH

-

The path to PEM private key for HTTPS. Must be set together with NIM_TLS_CERT_PATH.

Pipeline Environment Variables#

The following table contains the pipeline environment variables.

Name

Default

Description

NIM_ENGINE_COUNT

1

The number of CudaEngine instances.

NIM_ENGINE_DEVICES

-

The comma-separated CUDA device ordinals for explicit engine placement (e.g. NIM_ENGINE_DEVICES=0,1,2).

NIM_MAX_BATCH_SIZE

64

The number of sequences per forward pass.

NIM_MAX_SEQ_LEN

-

The maximum sequence length override. When unset, the model profile default is used.

Engine Environment Variables#

The following table contains the engine environment variables.

Name

Default

Description

HF_TOKEN

-

The Hugging Face token for model download. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Embedding NIM.

NGC_API_KEY

-

The NGC API key for model download when NIM_MODEL_DOWNLOAD_PROVIDER=ngc. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Embedding NIM.

NIM_MODEL_DOWNLOAD_PROVIDER

hf

The model download provider. Use hf for Hugging Face; use ngc for the NVIDIA NGC Catalog.

NIM_MODEL_NAME

nvidia/llama-nemotron-embed-vl-1b-v2

The model name returned in embedding responses.

NIM_MODEL_PATH

/model/embed

The in-container path for staged model artifacts. The directory must contain artifacts for a supported model. For details, refer to Custom Model Artifact Support in NVIDIA NeMo Retriever Embedding NIM.

NIM_PRECISION

-

The weight precision. One of: fp16, fp8. When unset, defaults to fp16 (or fp8 for auto-selected pipelines). Ignored when NIM_PIPELINE_ID is set.

NIM_PRECOMPILE_ONLY

-

Compile all CUDA artifacts then exit 0 (passthrough).

NIM_SERVED_MODEL_NAME

-

Optional served API model alias. If both NIM_MODEL_NAME and NIM_SERVED_MODEL_NAME are configured, NIM_SERVED_MODEL_NAME currently takes precedence.