Using External Endpoints#
You might want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the Helm Chart. Follow the steps below and update the Helm Chart before deployment.
Remote LLM Endpoint#
Here we show an example to use external Llama endpoint as the LLM.
Download the script: override_remote_endpoints.sh and export the necessary environment variables.
For updating the LLM endpoint, export the variables
NGC_API_KEY,CHART_NAME,HELM_URL,LLM_BASE_URLandLLM_MODEL.Run the script:
chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
Examples:
Using GPT-4o model
This requires to setup the OPENAI_API_KEY from https://platform.openai.com/api-keys.
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.4.0.tgz export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ export LLM_BASE_URL=https://api.openai.com/v1 export LLM_MODEL=gpt-4o chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
Using DeepSeek R1 model
Configure the LLM NIM to deploy the DeepSeek model. Refer to Configure the NIMs for more details.
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.4.0.tgz export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ export LLM_BASE_URL=https://integrate.api.nvidia.com/v1 export LLM_MODEL=deepseek-ai/deepseek-r1 chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
An overrides.yaml file will be generated in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.
Remote NeMo Rerank and Embedding Endpoint#
The steps are similar to using an remote LLM endpoint.
Download the script: override_remote_endpoints.sh and export the necessary environment variables.
For updating the Nemo Rerank Endpoint, export the variables
NGC_API_KEY,CHART_NAME,HELM_URL,RERANKER_URL.For updating the Embedding Endpoint, export the variables
NGC_API_KEY,CHART_NAME,HELM_URL,EMBEDDING_URL.Run the script:
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.4.0.tgz export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ export EMBEDDING_URL=<url_for_remote_embedding_endpoint> export RERANKER_URL=<url_for_remote_reranking_endpoint> chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
An overrides.yaml file will be generated in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.
Using NIMs from build.nvidia.com#
By default, VSS deploys all the dependent NIMs as part of the blueprint. In case,
you want to use NIMs from build.nvidia.com, you would need to generate an
NVIDIA Personal Key using the following steps:
Log in to https://build.nvidia.com/explore/discover.
Navigate to any NIM for example, https://build.nvidia.com/meta/llama3-70b.
Search for Get API Key on the page and click on it.
Click on Generate Key.
Store the generated API Key securely for future use.
Install the NVIDIA Personal API Key as a Kubernetes secret. This key is automatically used by the blueprint if the Kubernetes secret
nvidia-api-key-secretis created.
sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>
Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the Helm Chart. Use
base_url: https://integrate.api.nvidia.com/v1for embedding and LLM andbase_url: https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/rerankingfor reranker.Follow the steps in Configuration Options to install the blueprint with the overrides.
Using Riva ASR as a Remote Service#
The audio transcription feature in VSS can be enabled using remote RIVA ASR microservice, instead of having Riva ASR NIM as part of the VSS blueprint.
Download the script: override_remote_endpoints.sh.
Set the following env variables to override the default values and run the above script:
NGC_API_KEY,CHART_NAME,HELM_URL,RIVA_ASR_SERVER_URI,RIVA_ASR_GRPC_PORT,RIVA_ASR_SERVER_USE_SSL,RIVA_ASR_SERVER_IS_NIM,RIVA_ASR_MODEL_NAME
Example:
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.4.0.tgz #Specify the chart name export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ #Specify the helm repository URL export RIVA_ASR_SERVER_URI=<Riva ASR server URI> #Specify the Riva ASR server URI, for example: "10.10.10.10" export RIVA_ASR_GRPC_PORT=<Riva ASR gRPC port> #Specify the Riva ASR gRPC port, for example: 50051 export RIVA_ASR_SERVER_USE_SSL=<true/false> #Specify if Riva ASR should use SSL export RIVA_ASR_SERVER_IS_NIM=<true/false> #Specify if Riva ASR server is NIM export RIVA_ASR_MODEL_NAME=<"Riva ASR model name"> #Specify the Riva ASR model name chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
This should generate an
overrides.yamlfile in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.
Using Riva ASR NIM from build.nvidia.com#
The audio transcription feature in VSS can be enabled using remote riva ASR microservice from build.nvidia.com,
instead of having Riva ASR NIM as part of the VSS blueprint.
You would need to generate an API key and update the Helm Chart as follows:
Get the NVIDIA Personal key and create Kubernetes secret as shown in Using NIMs from build.nvidia.com. and assign it to
NVIDIA_API_KEY.Get the Function ID for the Riva ASR NIM from the Riva ASR NIM API page. For example for https://build.nvidia.com/nvidia/parakeet-ctc-0_6b-asr/api:
Download the script: override_remote_endpoints.sh.
Set the following
envvariables to override the default values and run the above script:NGC_API_KEY,CHART_NAME,HELM_URL,RIVA_ASR_SERVER_URI,RIVA_ASR_GRPC_PORT,RIVA_ASR_SERVER_USE_SSL,RIVA_ASR_SERVER_FUNC_ID.Example:
export NVIDIA_API_KEY=<your_nvidia_personal_key> sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=$NVIDIA_API_KEY export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.4.0.tgz #Specify the chart name export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ #Specify the helm repository URL export RIVA_ASR_SERVER_URI="grpc.nvcf.nvidia.com" #Specify the Riva ASR server URL export RIVA_ASR_GRPC_PORT="443" #Specify the Riva ASR gRPC port export RIVA_ASR_SERVER_USE_SSL="true" #Specify if Riva ASR should use SSL export RIVA_ASR_SERVER_FUNC_ID=<Function ID from the Riva ASR NIM API page> # Eg: e6fa172c-79bf-4b9c-bb37-14fe17b4226c chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
This should generate an
overrides.yamlfile in the same directory.Follow the steps in Configuration Options to install the blueprint with the overrides.