Environment Variables for NVIDIA NeMo Retriever Reranking NIM#
Use this documentation to learn about the environment variables for NVIDIA NeMo Retriever Reranking NIM.
Binary Environment Variables#
The following table contains the binary environment variables.
Name |
Default |
Description |
|---|---|---|
|
Pretty |
The log emit format. One of: pretty, json, compact. |
|
info |
The tracing EnvFilter. |
|
false |
True to print env-var help and exit. |
Server Environment Variables#
The following table contains the server environment variables.
Name |
Default |
Description |
|---|---|---|
|
0.0.0.0:8000 |
The HTTP listen address (host:port). |
|
- |
The optional KServe V2 gRPC listen address (host:port). Empty disables gRPC. |
|
- |
The maximum inbound KServe gRPC message size in bytes. Empty uses the effective HTTP body limit. |
|
1024 |
The batcher request queue depth. |
|
120 |
The timeout (s) for a single rerank request. |
|
- |
The path to PEM cert chain for HTTPS; empty means no TLS. |
|
- |
The path to PEM private key for HTTPS; empty means no TLS. |
Pipeline Environment Variables#
The following table contains the pipeline environment variables.
Name |
Default |
Description |
|---|---|---|
|
0 |
The override for admission semaphore size in passages. 0 = auto-derive (2 × engine_count × max_batch_size). |
|
1 |
The number of parallel inference engines. Ignored when |
|
- |
The comma-separated CUDA device ordinals for explicit engine placement (e.g. |
|
32 |
The max passages per engine forward pass. |
|
0 |
The override for chunk size on the chunked admission path (passages per chunk). 0 = auto-derive from max_batch_size. |
|
512 |
The max sequence length for tokenized (query, passage) pairs. |
Engine Environment Variables#
The following table contains the engine environment variables.
Name |
Default |
Description |
|---|---|---|
|
- |
The Hugging Face token for model download. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Reranking NIM. |
|
- |
The NGC API key for model download when |
|
0 |
The GPU device ID (0-indexed). |
|
hf |
The model download provider. Use |
|
nvidia/llama-nemotron-rerank-vl-1b-v2 |
The model name returned in API responses. |
|
/model/rerank |
The in-container path for staged model artifacts. The directory must contain artifacts for a supported model. For details, refer to Custom Model Artifact Support in NVIDIA NeMo Retriever Reranking NIM. |
|
fp16 |
The LLM trunk precision. One of: fp16 (default), fp8, fp8-e4m3. |
|
- |
Compile all CUDA artifacts then exit 0 (passthrough). |
|
- |
Optional served API model alias. If both |