Using External Endpoints#
You may want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the helm chart. Follow the steps mentioned below and update the helm chart before deployment.
Remote LLM Endpoint#
Here we show an example to use external Llama endpoint as the LLM.
Untar the VSS package:
tar -xvf nvidia-blueprint-vss-2.2.0.tgz
Remove the nim-llm subchart:
rm -r nvidia-blueprint-vss/charts/nim-llm/
Update top Chart.yaml and remove dependency on nim-llm sub chart:
vim nvidia-blueprint-vss/Chart.yaml
#- name: nim-llm # repository: "" # version: 0.0.1
Open
nvidia-blueprint-vss/values.yaml
and update the following.Comment out or remove the “check-llm-up” section.
# - args: # - "while ! curl -s -f -o /dev/null http://llm-nim-svc:8000/v1/health/live;\ # \ do\n echo \"Waiting for LLM...\"\n sleep 2\ndone\n" # command: # - sh # - -c # image: curlimages/curl:latest # name: check-llm-up
Update summarization.llm.base_url and chat.llm.base_url for CA-RAG config section as follows.
## ca_rag_config.yaml summarization: llm: #base_url: http://llm-nim-svc:8000/v1 base_url: <new endpoint for llama> #UPDATE model: meta/llama-3.1-70b-instruct chat: llm: #base_url: http://llm-nim-svc:8000/v1 base_url: <new endpoint for llama> #UPDATE model: meta/llama-3.1-70b-instruct
Update base_url of llama model under guardrail config section as follows.
## guardrails_config.yaml models: - engine: nim model: meta/llama-3.1-70b-instruct parameters: #base_url: http://llm-nim-svc:8000/v1 base_url: <new endpoint for llama> #UPDATE
Redeploy Helm Chart
Once the edits are done, you can retar the nvidia-blueprint-vss
folder.
tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss
Then follow the deployment section in the quickstart guide to launch the modified helm chart.
Remote NeMo Rerank and Embedding Endpoint#
The steps are similar to using an remote LLM endpoint.
Untar the VSS package:
tar -xvf nvidia-blueprint-vss-2.2.0.tgz
Remove the
nemo-embedding
and/ornemo-rerank
subcharts:
rm -r nvidia-blueprint-vss/charts/nemo-embedding/
rm -r nvidia-blueprint-vss/charts/nemo-rerank/
Update top Chart.yaml and remove dependency on
nemo-embedding
and/ornemo-rerank
sub chart:vim nvidia-blueprint-vss/Chart.yaml
#- name: nemo-embedding # repository: "" # version: 0.0.1 #- name: nemo-rerank # repository: "" # version: 0.0.1
Open
nvidia-blueprint-vss/values.yaml
and update the following.Update
summarization.embedding.base_url
,chat.embedding.base_url
and/orchat.rerank.base_url
for CA-RAG config section as follows.## ca_rag_config.yaml summarization: embedding: #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1 base_url: <new endpoint for NeMO embedding> #UPDATE chat: embedding: #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1 base_url: <new endpoint for NeMO embedding> #UPDATE reranker: #base_url: http://nemo-rerank-ranking-deployment-ranking-service:8000/v1 base_url: <new endpoint for NeMO reranker> #UPDATE
Update base_url of llama model under guardrail config section as follows.
## guardrails_config.yaml models: - engine: nim_patch model: nvidia/llama-3.2-nv-embedqa-1b-v2 parameters: #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1 base_url: <new endpoint for NeMO embedding> #UPDATE
Redeploy Helm Chart
Once the edits are done, you can retar the nvidia-blueprint-vss
folder.
tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss
Then follow the deployment section in the quickstart guide to launch the modified helm chart.
Using NIMs from build.nvidia.com#
By default, VSS deploys all the dependent NIMs as part of the blueprint. In case, you want to use NIMs from build.nvidia.com, you would need to generate an NVIDIA Personal Key using the following steps:
Log in to https://build.nvidia.com/explore/discover.
Navigate to any NIM e.g. https://build.nvidia.com/meta/llama3-70b.
Search for Get API Key on the page and click on it.
Click on Generate Key.
Store the generated API Key securely for future use.
Install the NVIDIA Personal API Key as a k8s secret.
sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>
Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the helm chart. Use
base_url: https://integrate.api.nvidia.com/v1
for the NIMs to be used as remote.Copy the example overrides file from Configuration Options.
Add the
NVIDIA_API_KEY
to theoverrides.yaml
file.vss: applicationSpecs: vss-deployment: containers: vss: env: ... - name: NVIDIA_API_KEY valueFrom: secretKeyRef: name: nvidia-api-key-secret key: NVIDIA_API_KEY
Follow the steps in Configuration Options to install the blueprint with the overrides.