Environment Variables for NVIDIA NeMo Retriever Reranking NIM#

Use this documentation to learn about the environment variables for NVIDIA NeMo Retriever Reranking NIM.

Binary Environment Variables#

The following table contains the binary environment variables.

Name	Default	Description
`LOG_FORMAT`	Pretty	The log emit format. One of: pretty, json, compact.
`RUST_LOG`	info	The tracing EnvFilter.
`SHOW_CONFIG`	false	True to print env-var help and exit.

The following table contains the server environment variables.

Name	Default	Description
`NIM_BIND_ADDR`	0.0.0.0:8000	The HTTP listen address (host:port).
`NIM_GRPC_BIND_ADDR`	-	The optional KServe V2 gRPC listen address (host:port). Empty disables gRPC.
`NIM_GRPC_MAX_DECODING_MESSAGE_BYTES`	-	The maximum inbound KServe gRPC message size in bytes. Empty uses the effective HTTP body limit.
`NIM_MAX_QUEUE_SIZE`	1024	The batcher request queue depth.
`NIM_REQUEST_TIMEOUT_S`	120	The timeout (s) for a single rerank request.
`NIM_TLS_CERT_PATH`	-	The path to PEM cert chain for HTTPS; empty means no TLS.
`NIM_TLS_KEY_PATH`	-	The path to PEM private key for HTTPS; empty means no TLS.

The following table contains the pipeline environment variables.

Name	Default	Description
`NIM_ADMISSION_SIZE`	0	The override for admission semaphore size in passages. 0 = auto-derive (2 × engine_count × max_batch_size).
`NIM_ENGINE_COUNT`	1	The number of parallel inference engines. Ignored when `NIM_ENGINE_DEVICES` is set. For details, refer to Engine Count.
`NIM_ENGINE_DEVICES`	-	The comma-separated CUDA device ordinals for explicit engine placement (e.g. `NIM_ENGINE_DEVICES`=0,1,2,3). When set, one engine is started per listed device and `NIM_ENGINE_COUNT` is ignored. Duplicate ordinals are rejected. Default: unset (use `NIM_ENGINE_COUNT` engines on `NIM_CUDA_DEVICE`).
`NIM_MAX_BATCH_SIZE`	32	The max passages per engine forward pass.
`NIM_MAX_CHUNK_SIZE`	0	The override for chunk size on the chunked admission path (passages per chunk). 0 = auto-derive from max_batch_size.
`NIM_MAX_SEQ_LEN`	512	The max sequence length for tokenized (query, passage) pairs.

The following table contains the engine environment variables.

Name	Default	Description
`HF_TOKEN`	-	The Hugging Face token for model download. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Reranking NIM.
`NGC_API_KEY`	-	The NGC API key for model download when `NIM_MODEL_DOWNLOAD_PROVIDER=ngc`. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Reranking NIM.
`NIM_CUDA_DEVICE`	0	The GPU device ID (0-indexed).
`NIM_MODEL_DOWNLOAD_PROVIDER`	hf	The model download provider. Use `hf` for Hugging Face; use `ngc` for the NVIDIA NGC Catalog.
`NIM_MODEL_NAME`	nvidia/llama-nemotron-rerank-vl-1b-v2	The model name returned in API responses.
`NIM_MODEL_PATH`	/model/rerank	The in-container path for staged model artifacts. The directory must contain artifacts for a supported model. For details, refer to Custom Model Artifact Support in NVIDIA NeMo Retriever Reranking NIM.
`NIM_PRECISION`	fp16	The LLM trunk precision. One of: fp16 (default), fp8, fp8-e4m3.
`NIM_PRECOMPILE_ONLY`	-	Compile all CUDA artifacts then exit 0 (passthrough).
`NIM_SERVED_MODEL_NAME`	-	Optional served API model alias. If both `NIM_MODEL_NAME` and `NIM_SERVED_MODEL_NAME` are configured, `NIM_SERVED_MODEL_NAME` currently takes precedence.