Using External Endpoints#

You may want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the helm chart. Follow the steps mentioned below and update the helm chart before deployment.

Remote LLM Endpoint#

Here we show an example to use external Llama endpoint as the LLM.

Untar the VSS package: tar -xvf nvidia-blueprint-vss-2.2.0.tgz
Remove the nim-llm subchart: rm -r nvidia-blueprint-vss/charts/nim-llm/
Update top Chart.yaml and remove dependency on nim-llm sub chart: vim nvidia-blueprint-vss/Chart.yaml
```
#- name: nim-llm
#  repository: ""
#  version: 0.0.1
```

Open nvidia-blueprint-vss/values.yaml and update the following.

Comment out or remove the “check-llm-up” section.

#  - args:
      #        - "while ! curl -s -f -o /dev/null http://llm-nim-svc:8000/v1/health/live;\
      #          \ do\n  echo \"Waiting for LLM...\"\n  sleep 2\ndone\n"
      #        command:
      #        - sh
      #        - -c
      #        image: curlimages/curl:latest
      #        name: check-llm-up

Update summarization.llm.base_url and chat.llm.base_url for CA-RAG config section as follows.

## ca_rag_config.yaml

summarization:
   llm:
      #base_url: http://llm-nim-svc:8000/v1
      base_url: <new endpoint for llama>  #UPDATE
      model: meta/llama-3.1-70b-instruct

chat:
   llm:
      #base_url: http://llm-nim-svc:8000/v1
      base_url: <new endpoint for llama>  #UPDATE
      model: meta/llama-3.1-70b-instruct

Update base_url of llama model under guardrail config section as follows.

## guardrails_config.yaml

models:
- engine: nim
   model: meta/llama-3.1-70b-instruct
   parameters:
      #base_url: http://llm-nim-svc:8000/v1
      base_url: <new endpoint for llama> #UPDATE

Redeploy Helm Chart

Once the edits are done, you can retar the nvidia-blueprint-vss folder.

tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss

Then follow the deployment section in the quickstart guide to launch the modified helm chart.

Remote NeMo Rerank and Embedding Endpoint#

The steps are similar to using an remote LLM endpoint.

Untar the VSS package: tar -xvf nvidia-blueprint-vss-2.2.0.tgz
Remove the nemo-embedding and/or nemo-rerank subcharts:

rm -r nvidia-blueprint-vss/charts/nemo-embedding/

rm -r nvidia-blueprint-vss/charts/nemo-rerank/

Update top Chart.yaml and remove dependency on nemo-embedding and/or nemo-rerank sub chart: vim nvidia-blueprint-vss/Chart.yaml

#- name: nemo-embedding
#  repository: ""
#  version: 0.0.1
#- name: nemo-rerank
#  repository: ""
#  version: 0.0.1

Open nvidia-blueprint-vss/values.yaml and update the following.

Update summarization.embedding.base_url, chat.embedding.base_url and/or chat.rerank.base_url for CA-RAG config section as follows.

## ca_rag_config.yaml

summarization:
   embedding:
      #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
      base_url: <new endpoint for NeMO embedding>  #UPDATE

chat:
   embedding:
      #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
      base_url: <new endpoint for NeMO embedding>  #UPDATE
   reranker:
      #base_url: http://nemo-rerank-ranking-deployment-ranking-service:8000/v1
      base_url: <new endpoint for NeMO reranker>  #UPDATE

Update base_url of llama model under guardrail config section as follows.

## guardrails_config.yaml

models:
- engine: nim_patch
   model: nvidia/llama-3.2-nv-embedqa-1b-v2
   parameters:
      #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
      base_url: <new endpoint for NeMO embedding> #UPDATE

Redeploy Helm Chart

Once the edits are done, you can retar the nvidia-blueprint-vss folder.

tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss

Then follow the deployment section in the quickstart guide to launch the modified helm chart.

Using NIMs from build.nvidia.com#

By default, VSS deploys all the dependent NIMs as part of the blueprint. In case, you want to use NIMs from build.nvidia.com, you would need to generate an NVIDIA Personal Key using the following steps:

Log in to https://build.nvidia.com/explore/discover.
Navigate to any NIM e.g. https://build.nvidia.com/meta/llama3-70b.
Search for Get API Key on the page and click on it.
Click on Generate Key.
Store the generated API Key securely for future use.
Install the NVIDIA Personal API Key as a k8s secret.

sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>

Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the helm chart. Use base_url: https://integrate.api.nvidia.com/v1 for the NIMs to be used as remote.
Copy the example overrides file from Configuration Options.

Add the NVIDIA_API_KEY to the overrides.yaml file.

vss:
 applicationSpecs:
   vss-deployment:
     containers:
       vss:
         env:
         ...
         - name: NVIDIA_API_KEY
           valueFrom:
             secretKeyRef:
               name: nvidia-api-key-secret
               key: NVIDIA_API_KEY

Follow the steps in Configuration Options to install the blueprint with the overrides.