Environment Variables for NVIDIA NeMo Retriever Embedding NIM#
Use this documentation to learn about the environment variables for NVIDIA NeMo Retriever Embedding NIM.
Binary Environment Variables#
The following table contains the binary environment variables.
Name |
Default |
Description |
|---|---|---|
|
Pretty |
The log emit format. One of: pretty, json, compact. |
|
info |
The tracing EnvFilter. |
|
false |
True to print env-var help and exit. |
Server Environment Variables#
The following table contains the server environment variables.
Name |
Default |
Description |
|---|---|---|
|
0.0.0.0:8000 |
The HTTP listen address (host:port). |
|
- |
The optional KServe V2 gRPC listen address (host:port). Empty disables gRPC. |
|
- |
The maximum inbound KServe gRPC message size in bytes. Empty uses the effective HTTP body limit. |
|
- |
The HTTP request body limit for /v1/embeddings in bytes (passthrough). |
|
- |
The maximum allowed bytes for an embedded image payload in a single request. |
|
1024 |
The batcher request queue depth. |
|
1 |
The maximum milliseconds to wait for additional requests before dispatching a batch. |
|
120 |
The request timeout in seconds. |
|
- |
The path to PEM certificate chain for HTTPS. When set with |
|
- |
The path to PEM private key for HTTPS. Must be set together with |
Pipeline Environment Variables#
The following table contains the pipeline environment variables.
Name |
Default |
Description |
|---|---|---|
|
1 |
The number of CudaEngine instances. |
|
- |
The comma-separated CUDA device ordinals for explicit engine placement (e.g. |
|
64 |
The number of sequences per forward pass. |
|
- |
The maximum sequence length override. When unset, the model profile default is used. |
Engine Environment Variables#
The following table contains the engine environment variables.
Name |
Default |
Description |
|---|---|---|
|
- |
The Hugging Face token for model download. Passed through to the model downloader. For details, refer to Get Started With NVIDIA NeMo Retriever Embedding NIM. |
|
- |
The NGC API key for model download when |
|
hf |
The model download provider. Use |
|
nvidia/llama-nemotron-embed-vl-1b-v2 |
The model name returned in embedding responses. |
|
/model/embed |
The in-container path for staged model artifacts. The directory must contain artifacts for a supported model. For details, refer to Custom Model Artifact Support in NVIDIA NeMo Retriever Embedding NIM. |
|
- |
The weight precision. One of: fp16, fp8. When unset, defaults to fp16 (or fp8 for auto-selected pipelines). Ignored when |
|
- |
Compile all CUDA artifacts then exit 0 (passthrough). |
|
- |
Optional served API model alias. If both |