Using External Endpoints#
You may want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the helm chart. Follow the steps mentioned below and update the helm chart before deployment.
Remote LLM Endpoint#
Here we show an example to use external Llama endpoint as the LLM.
Download the script: override_remote_endpoints.sh and export the necessary environment variables.
For updating the LLM endpoint, export the variables
NGC_API_KEY
,CHART_NAME
,HELM_URL
,LLM_BASE_URL
andLLM_MODEL
.Run the script
chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
Examples:
Using GPT-4o model
This requires to setup the OPENAI_API_KEY as shown Configuring for GPT-4o.
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ export LLM_BASE_URL=https://api.openai.com/v1 export LLM_MODEL=gpt-4o chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
Using DeepSeek R1 model
Configure the LLM NIM to deploy the DeepSeek model. Refer to Configure the NIMs for more details.
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ export LLM_BASE_URL=https://integrate.api.nvidia.com/v1 export LLM_MODEL=deepseek-ai/deepseek-r1 chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
An overrides.yaml
file will be generated in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.
Remote NeMo Rerank and Embedding Endpoint#
The steps are similar to using an remote LLM endpoint.
Download the script: override_remote_endpoints.sh and export the necessary environment variables.
For updating the Nemo Rerank Endpoint, export the variables
NGC_API_KEY
,CHART_NAME
,HELM_URL
,RERANKER_URL
For updating the Embedding Endpoint, export the variables
NGC_API_KEY
,CHART_NAME
,HELM_URL
,EMBEDDING_URL
Run the script
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ export EMBEDDING_URL=<url_for_remote_embedding_endpoint> export RERANKER_URL=<url_for_remote_reranking_endpoint> chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
An overrides.yaml
file will be generated in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.
Using NIMs from build.nvidia.com#
By default, VSS deploys all the dependent NIMs as part of the blueprint. In case, you want to use NIMs from build.nvidia.com, you would need to generate an NVIDIA Personal Key using the following steps:
Log in to https://build.nvidia.com/explore/discover.
Navigate to any NIM e.g. https://build.nvidia.com/meta/llama3-70b.
Search for Get API Key on the page and click on it.
Click on Generate Key.
Store the generated API Key securely for future use.
Install the NVIDIA Personal API Key as a k8s secret.
sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>
Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the helm chart. Use
base_url: https://integrate.api.nvidia.com/v1
for embedding and llm andbase_url: https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking
for reranker.Copy the example overrides file from Configuration Options.
Add the
NVIDIA_API_KEY
to theoverrides.yaml
file.vss: applicationSpecs: vss-deployment: containers: vss: env: ... - name: NVIDIA_API_KEY valueFrom: secretKeyRef: name: nvidia-api-key-secret key: NVIDIA_API_KEY
Follow the steps in Configuration Options to install the blueprint with the overrides.
Using Riva ASR as a remote service#
The audio transcription feature in VSS can be enabled using remote RIVA ASR microservice, instead of having Riva ASR NIM as part of the VSS blueprint.
Download the script: override_remote_endpoints.sh.
Set the following env variables to override the default values and run the above script:
NGC_API_KEY
,CHART_NAME
,HELM_URL
,RIVA_ASR_SERVER_URI
,RIVA_ASR_GRPC_PORT
,RIVA_ASR_SERVER_USE_SSL
,RIVA_ASR_SERVER_IS_NIM
,RIVA_ASR_MODEL_NAME
Example:
export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz #Specify the chart name export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ #Specify the helm repository URL export RIVA_ASR_SERVER_URI=<Riva ASR server URI> #Specify the Riva ASR server URI e.g "10.10.10.10" export RIVA_ASR_GRPC_PORT=<Riva ASR gRPC port> #Specify the Riva ASR gRPC port eg: 50051 export RIVA_ASR_SERVER_USE_SSL=<true/false> #Specify if Riva ASR should use SSL export RIVA_ASR_SERVER_IS_NIM=<true/false> #Specify if Riva ASR server is NIM export RIVA_ASR_MODEL_NAME=<"Riva ASR model name"> #Specify the Riva ASR model name chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
This should generate an
overrides.yaml
file in the same directory.
Follow the steps in Configuration Options to install the blueprint with the overrides.
Using Riva ASR NIM from build.nvidia.com#
The audio transcription feature in VSS can be enabled using remote riva
ASR microservice from build.nvidia.com,
instead of having Riva ASR NIM as part of the VSS blueprint.
You would need to generate an API key and update the helm chart as follows:
Get the NVIDIA Personal key and create kubernetes secret as shown in Using NIMs from build.nvidia.com. and assign it to
NVIDIA_API_KEY
Get the Function ID for the Riva ASR NIM from the Riva ASR NIM API page. For example for https://build.nvidia.com/nvidia/parakeet-ctc-0_6b-asr/api:
Download the script: override_remote_endpoints.sh.
Set the following env variables to override the default values and run the above script:
NGC_API_KEY
,CHART_NAME
,HELM_URL
,RIVA_ASR_SERVER_URI
,RIVA_ASR_GRPC_PORT
,RIVA_ASR_SERVER_USE_SSL
,RIVA_ASR_SERVER_FUNC_ID
Example:
export NVIDIA_API_KEY=<your_nvidia_personal_key> sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=$NVIDIA_API_KEY export NGC_API_KEY=<your_ngc_api_key> export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz #Specify the chart name export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/ #Specify the helm repository URL export RIVA_ASR_SERVER_URI="grpc.nvcf.nvidia.com" #Specify the Riva ASR server URL export RIVA_ASR_GRPC_PORT="443" #Specify the Riva ASR gRPC port export RIVA_ASR_SERVER_USE_SSL="true" #Specify if Riva ASR should use SSL export RIVA_ASR_SERVER_FUNC_ID=<Function ID from the Riva ASR NIM API page> # Eg: e6fa172c-79bf-4b9c-bb37-14fe17b4226c chmod +x ./override_remote_endpoints.sh ./override_remote_endpoints.sh
This should generate an
overrides.yaml
file in the same directory.Follow the steps in Configuration Options to install the blueprint with the overrides.