Environment Variables for NeMo Retriever Text Embedding NIM#

Use this documentation to learn about the environment variables for NeMo Retriever Text Embedding NIM.

Environment Variables#

The following table identifies the environment variables that are used in the container. Set environment variables with the -e command-line argument to the docker run command.

Name	Description	Default Value
`NGC_API_KEY`	Set this variable to the value of your personal NGC API key.	None
`NIM_CACHE_PATH`	Specifies the fully qualified path, in the container, for downloaded models.	`/opt/nim/.cache`
`NIM_GRPC_API_PORT`	Specifies the network port number, in the container, for gRPC access to the microservice.	`50051`
`NIM_HTTP_API_PORT`	Specifies the network port number, in the container, for HTTP access to the microservice. Refer to Publishing ports in the Docker documentation for more information about host and container network ports.	`8000`
`NIM_HTTP_MAX_WORKERS`	Specifies the number of worker threads to start for HTTP requests.	`1`
`NIM_HTTP_TRITON_PORT`	Specifies the network port number, in the container, for NVIDIA Triton Inference Server.	`8080`
`NIM_IGNORE_MODEL_DOWNLOAD_FAIL`	When set to `true` and the microservice fails to download a model from NGC, the microservice continues to run rather than exit. This environment variable can be useful in an air-gapped environment.	`false`
`NIM_LOG_LEVEL`	Specifies the logging level. The microservice supports the following values: DEBUG, INFO, WARNING, ERROR, and CRITICAL.	`INFO`
`NIM_LOGGING_JSONL`	When set to `true`, the microservice creates log records in the JSONL format.	`false`
`NIM_MANIFEST_ALLOW_UNSAFE`	Set to `1` to enable selection of a model profile that is not included in the original `model_manifest.yaml` or a profile that is not detected to be compatible with the deployed hardware.	`0`
`NIM_MANIFEST_PATH`	Specifies the fully qualified path, in the container, for the model manifest YAML file.	`/opt/nim/etc/default/model_manifest.yaml`
`NIM_MODEL_PROFILE`	Specifies the model profile ID to use with the container. By default, the container attempts to automatically match the host GPU model and GPU count with the optimal model profile.	None
`NIM_NUM_MODEL_INSTANCES`	The number of model instances to deploy.	Unset (this value overrides a hardware-specific config value)
`NIM_NUM_TOKENIZERS`	The number of tokenizer instances to use.	`min(max(os.cpu_count() // 2, 1), 16)`
`NIM_REPOSITORY_OVERRIDE`	If set to a non-empty string, the `NIM_REPOSITORY_OVERRIDE` value replaces the hard-coded location of the repository and the protocol for access to the repository. The structure of the value for this environment variable is as follows: `<repository type>://<repository location>`. Only the protocols `ngc://`, `s3://`, and `https://` are supported, and only the first component of the URI is replaced. For example: - If the URI in the manifest is `ngc://org/meta/llama3-8b-instruct:hf?file=config.json` and `NIM_REPOSITORY_OVERRIDE=ngc://myrepo.ai/`, the domain name for the API endpoint is set to `myrepo.ai`. - If `NIM_REPOSITORY_OVERRIDE=s3://mybucket/`, the result of the replacement will be `s3://mybucket/nim%2Fmeta%2Fllama3-8b-instruct%3Ahf%3Ffile%3Dconfig.json`. - If `NIM_REPOSITORY_OVERRIDE=https://mymodel.ai/some_path`, the result of the replacement will be `https://mymodel.ai/some_path/nim%2Fmeta%2Fllama3-8b-instruct%3Ahf%3Ffile%3Dconfig.json`. This repository override feature supports basic authentication mechanisms: - `https` assumes authorization using the Authorization header and the credential value in `NIM_HTTPS_CREDENTIAL`. - `ngc` requires a credential in the `NGC_API_KEY` environment variable. - `s3` requires the environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and (if using temporary credentials) `AWS_SESSION_TOKEN`.	None
`NIM_SERVED_MODEL_NAME`	Specifies the model names used in the API. Specify multiple names in a comma-separated list. If you specify multiple names, the server responds to any of the names. The name in the model field of a response is the first name in this list. By default, the model is inferred from the `model_manifest.yaml`.	None
`NIM_TRITON_DYNAMIC_BATCHING_MAX_QUEUE_DELAY_MICROSECONDS`	For the NVIDIA Triton Inference Server, sets the max queue delayed time to allow other requests to join the dynamic batch. For more information, refer to the Triton User Guide.	`100us (microseconds)`
`NIM_TRITON_GRPC_PORT`	Specifies the gRPC port number, for NVIDIA Triton Inference Server.	`8001`
`NIM_TRITON_LOG_VERBOSE`	When set to `1`, the container starts NVIDIA Triton Inference Server with verbose logging.	`0`
`NIM_TRITON_PERFORMANCE_MODE`	Controls the performance mode of the NIM. When set to `latency` (default), optimizes for minimum latency on short sequence length, low batch size inference requests. When set to `throughput`, optimizes for maximum throughput on long sequence length, high batch size inference requests.	`latency`