Is this page helpful?

Environment Variables for NVIDIA NeMo Retriever Embedding NIM#

Use this documentation to learn about the environment variables for NVIDIA NeMo Retriever Embedding NIM.

Binary Environment Variables#

The following table contains the binary environment variables.

Name	Default	Description
`LOG_FORMAT`	Pretty	The log emit format. One of: pretty, json, compact.
`RUST_LOG`	info	The tracing EnvFilter.
`SHOW_CONFIG`	false	True to print env-var help and exit.

Server Environment Variables#

The following table contains the server environment variables.

Name	Default	Description
`NIM_BIND_ADDR`	0.0.0.0:8000	The HTTP listen address (host:port).
`NIM_GRPC_BIND_ADDR`	-	The optional KServe V2 gRPC listen address (host:port). Empty disables gRPC.
`NIM_GRPC_MAX_DECODING_MESSAGE_BYTES`	-	The maximum inbound KServe gRPC message size in bytes. Empty uses the effective HTTP body limit.
`NIM_HTTP_BODY_LIMIT_BYTES`	-	The HTTP request body limit for /v1/embeddings in bytes (passthrough).
`NIM_MAX_IMAGE_BYTES`	-	The maximum allowed bytes for an embedded image payload in a single request.
`NIM_MAX_QUEUE_SIZE`	1024	The batcher request queue depth.
`NIM_MAX_WAIT_MS`	1	The maximum milliseconds to wait for additional requests before dispatching a batch.
`NIM_REQUEST_TIMEOUT_S`	120	The request timeout in seconds.
`NIM_TLS_CERT_PATH`	-	The path to PEM certificate chain for HTTPS. When set with `NIM_TLS_KEY_PATH`, enables TLS.
`NIM_TLS_KEY_PATH`	-	The path to PEM private key for HTTPS. Must be set together with `NIM_TLS_CERT_PATH`.

Pipeline Environment Variables#

The following table contains the pipeline environment variables.

Name	Default	Description
`NIM_ENGINE_COUNT`	1	The number of CudaEngine instances.
`NIM_ENGINE_DEVICES`	-	The comma-separated CUDA device ordinals for explicit engine placement (e.g. `NIM_ENGINE_DEVICES`=0,1,2).
`NIM_MAX_BATCH_SIZE`	64	The number of sequences per forward pass.
`NIM_MAX_SEQ_LEN`	-	The maximum sequence length override. When unset, the model profile default is used.

Engine Environment Variables#

The following table contains the engine environment variables.

Name	Default	Description
`HF_TOKEN`	-	The Hugging Face token for model download. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Embedding NIM.
`NGC_API_KEY`	-	The NGC API key for model download when `NIM_MODEL_DOWNLOAD_PROVIDER=ngc`. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Embedding NIM.
`NIM_ENGINE_MODEL_DOWNLOAD_ONLY`	-	Set to `1` to download model weights and exit before CUDA initialization. Use this option to stage weights for air-gapped deployment or on hosts without visible GPUs. Requires `HF_TOKEN` for the default Hugging Face provider, or `NGC_API_KEY` when `NIM_MODEL_DOWNLOAD_PROVIDER=ngc`.
`NIM_MODEL_DOWNLOAD_PROVIDER`	hf	The model download provider. Use `hf` for Hugging Face; use `ngc` for the NVIDIA NGC Catalog.
`NIM_MODEL_NAME`	nvidia/llama-nemotron-embed-vl-1b-v2	The model name returned in embedding responses.
`NIM_MODEL_PATH`	/model/embed	The in-container path for staged model artifacts. The directory must contain artifacts for a supported model. For details, refer to Custom Model Artifact Support in NVIDIA NeMo Retriever Embedding NIM.
`NIM_PRECISION`	-	The weight precision. One of: fp16, fp8. When unset, defaults to fp16 (or fp8 for auto-selected pipelines).
`NIM_PRECOMPILE_ONLY`	-	Compile all CUDA artifacts then exit 0 (passthrough).
`NIM_SERVED_MODEL_NAME`	-	Optional served API model alias. If both `NIM_MODEL_NAME` and `NIM_SERVED_MODEL_NAME` are configured, `NIM_SERVED_MODEL_NAME` currently takes precedence.