Using External Endpoints#

You may want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the helm chart. Follow the steps mentioned below and update the helm chart before deployment.

Remote LLM Endpoint#

Here we show an example to use external Llama endpoint as the LLM.

Download the script: override_remote_endpoints.sh and export the necessary environment variables.
For updating the LLM endpoint, export the variables NGC_API_KEY, CHART_NAME, HELM_URL, LLM_BASE_URL and LLM_MODEL.

Run the script

chmod +x ./override_remote_endpoints.sh
./override_remote_endpoints.sh

Examples:

Using GPT-4o model

This requires to setup the OPENAI_API_KEY from https://platform.openai.com/api-keys.

export NGC_API_KEY=<your_ngc_api_key>
export CHART_NAME=nvidia-blueprint-vss-2.3.1.tgz
export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/
export LLM_BASE_URL=https://api.openai.com/v1
export LLM_MODEL=gpt-4o
chmod +x ./override_remote_endpoints.sh
./override_remote_endpoints.sh

Using DeepSeek R1 model

Configure the LLM NIM to deploy the DeepSeek model. Refer to Configure the NIMs for more details.

export NGC_API_KEY=<your_ngc_api_key>
export CHART_NAME=nvidia-blueprint-vss-2.3.1.tgz
export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/
export LLM_BASE_URL=https://integrate.api.nvidia.com/v1
export LLM_MODEL=deepseek-ai/deepseek-r1
chmod +x ./override_remote_endpoints.sh
./override_remote_endpoints.sh

An overrides.yaml file will be generated in the same directory. Follow the steps in Configuration Options to install the blueprint with the overrides.

Remote NeMo Rerank and Embedding Endpoint#

The steps are similar to using an remote LLM endpoint.

Download the script: override_remote_endpoints.sh and export the necessary environment variables.
For updating the Nemo Rerank Endpoint, export the variables NGC_API_KEY, CHART_NAME, HELM_URL, RERANKER_URL
For updating the Embedding Endpoint, export the variables NGC_API_KEY, CHART_NAME, HELM_URL, EMBEDDING_URL

Run the script

export NGC_API_KEY=<your_ngc_api_key>
export CHART_NAME=nvidia-blueprint-vss-2.3.1.tgz
export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/
export EMBEDDING_URL=<url_for_remote_embedding_endpoint>
export RERANKER_URL=<url_for_remote_reranking_endpoint>
chmod +x ./override_remote_endpoints.sh
./override_remote_endpoints.sh

An overrides.yaml file will be generated in the same directory. Follow the steps in Configuration Options to install the blueprint with the overrides.

Using NIMs from build.nvidia.com#

By default, VSS deploys all the dependent NIMs as part of the blueprint. In case, you want to use NIMs from build.nvidia.com, you would need to generate an NVIDIA Personal Key using the following steps:

Log in to https://build.nvidia.com/explore/discover.
Navigate to any NIM e.g. https://build.nvidia.com/meta/llama3-70b.
Search for Get API Key on the page and click on it.
Click on Generate Key.
Store the generated API Key securely for future use.
Install the NVIDIA Personal API Key as a k8s secret. This key is automatically used by the blueprint if the kubernetes secret nvidia-api-key-secret is created.

sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>

Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the helm chart. Use base_url: https://integrate.api.nvidia.com/v1 for embedding and llm and base_url: https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking for reranker.
Follow the steps in Configuration Options to install the blueprint with the overrides.

Using Riva ASR as a remote service#

The audio transcription feature in VSS can be enabled using remote RIVA ASR microservice, instead of having Riva ASR NIM as part of the VSS blueprint.

Download the script: override_remote_endpoints.sh.
Set the following env variables to override the default values and run the above script: NGC_API_KEY, CHART_NAME, HELM_URL, RIVA_ASR_SERVER_URI, RIVA_ASR_GRPC_PORT, RIVA_ASR_SERVER_USE_SSL, RIVA_ASR_SERVER_IS_NIM, RIVA_ASR_MODEL_NAME

Example:

export NGC_API_KEY=<your_ngc_api_key>
export CHART_NAME=nvidia-blueprint-vss-2.3.1.tgz             #Specify the chart name
export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/  #Specify the helm repository URL
export RIVA_ASR_SERVER_URI=<Riva ASR server URI>  #Specify the Riva ASR server URI e.g "10.10.10.10"
export RIVA_ASR_GRPC_PORT=<Riva ASR gRPC port>     #Specify the Riva ASR gRPC port eg: 50051
export RIVA_ASR_SERVER_USE_SSL=<true/false>       #Specify if Riva ASR should use SSL
export RIVA_ASR_SERVER_IS_NIM=<true/false> #Specify if Riva ASR server is NIM
export RIVA_ASR_MODEL_NAME=<"Riva ASR model name">    #Specify the Riva ASR model name
chmod +x ./override_remote_endpoints.sh
./override_remote_endpoints.sh

This should generate an overrides.yaml file in the same directory.

Follow the steps in Configuration Options to install the blueprint with the overrides.

Using Riva ASR NIM from build.nvidia.com#

The audio transcription feature in VSS can be enabled using remote riva ASR microservice from build.nvidia.com, instead of having Riva ASR NIM as part of the VSS blueprint. You would need to generate an API key and update the helm chart as follows:

Get the NVIDIA Personal key and create kubernetes secret as shown in Using NIMs from build.nvidia.com. and assign it to NVIDIA_API_KEY
Get the Function ID for the Riva ASR NIM from the Riva ASR NIM API page. For example for https://build.nvidia.com/nvidia/parakeet-ctc-0_6b-asr/api:
Download the script: override_remote_endpoints.sh.

Set the following env variables to override the default values and run the above script: NGC_API_KEY, CHART_NAME, HELM_URL, RIVA_ASR_SERVER_URI, RIVA_ASR_GRPC_PORT, RIVA_ASR_SERVER_USE_SSL, RIVA_ASR_SERVER_FUNC_ID

Example:

export NVIDIA_API_KEY=<your_nvidia_personal_key>
sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=$NVIDIA_API_KEY
export NGC_API_KEY=<your_ngc_api_key>
export CHART_NAME=nvidia-blueprint-vss-2.3.1.tgz             #Specify the chart name
export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/  #Specify the helm repository URL
export RIVA_ASR_SERVER_URI="grpc.nvcf.nvidia.com"    #Specify the Riva ASR server URL
export RIVA_ASR_GRPC_PORT="443"     #Specify the Riva ASR gRPC port
export RIVA_ASR_SERVER_USE_SSL="true"       #Specify if Riva ASR should use SSL
export RIVA_ASR_SERVER_FUNC_ID=<Function ID from the Riva ASR NIM API page> # Eg: e6fa172c-79bf-4b9c-bb37-14fe17b4226c
chmod +x ./override_remote_endpoints.sh
./override_remote_endpoints.sh

This should generate an overrides.yaml file in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.