Using External Endpoints#

You may want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the helm chart. Follow the steps mentioned below and update the helm chart before deployment.

Remote LLM Endpoint#

Here we show an example to use external Llama endpoint as the LLM.

  1. Untar the VSS package: tar -xvf nvidia-blueprint-vss-2.2.0.tgz

  2. Remove the nim-llm subchart: rm -r nvidia-blueprint-vss/charts/nim-llm/

  3. Update top Chart.yaml and remove dependency on nim-llm sub chart: vim nvidia-blueprint-vss/Chart.yaml

    #- name: nim-llm
    #  repository: ""
    #  version: 0.0.1
    
  4. Open nvidia-blueprint-vss/values.yaml and update the following.

    • Comment out or remove the “check-llm-up” section.

      #  - args:
            #        - "while ! curl -s -f -o /dev/null http://llm-nim-svc:8000/v1/health/live;\
            #          \ do\n  echo \"Waiting for LLM...\"\n  sleep 2\ndone\n"
            #        command:
            #        - sh
            #        - -c
            #        image: curlimages/curl:latest
            #        name: check-llm-up
      
    • Update summarization.llm.base_url and chat.llm.base_url for CA-RAG config section as follows.

      ## ca_rag_config.yaml
      
      summarization:
         llm:
            #base_url: http://llm-nim-svc:8000/v1
            base_url: <new endpoint for llama>  #UPDATE
            model: meta/llama-3.1-70b-instruct
      
      chat:
         llm:
            #base_url: http://llm-nim-svc:8000/v1
            base_url: <new endpoint for llama>  #UPDATE
            model: meta/llama-3.1-70b-instruct
      
    • Update base_url of llama model under guardrail config section as follows.

      ## guardrails_config.yaml
      
      models:
      - engine: nim
         model: meta/llama-3.1-70b-instruct
         parameters:
            #base_url: http://llm-nim-svc:8000/v1
            base_url: <new endpoint for llama> #UPDATE
      
  5. Redeploy Helm Chart

Once the edits are done, you can retar the nvidia-blueprint-vss folder.

tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss

Then follow the deployment section in the quickstart guide to launch the modified helm chart.

Remote NeMo Rerank and Embedding Endpoint#

The steps are similar to using an remote LLM endpoint.

  1. Untar the VSS package: tar -xvf nvidia-blueprint-vss-2.2.0.tgz

  2. Remove the nemo-embedding and/or nemo-rerank subcharts:

rm -r nvidia-blueprint-vss/charts/nemo-embedding/

rm -r nvidia-blueprint-vss/charts/nemo-rerank/

  1. Update top Chart.yaml and remove dependency on nemo-embedding and/or nemo-rerank sub chart: vim nvidia-blueprint-vss/Chart.yaml

    #- name: nemo-embedding
    #  repository: ""
    #  version: 0.0.1
    #- name: nemo-rerank
    #  repository: ""
    #  version: 0.0.1
    
  2. Open nvidia-blueprint-vss/values.yaml and update the following.

    • Update summarization.embedding.base_url, chat.embedding.base_url and/or chat.rerank.base_url for CA-RAG config section as follows.

      ## ca_rag_config.yaml
      
      summarization:
         embedding:
            #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
            base_url: <new endpoint for NeMO embedding>  #UPDATE
      
      chat:
         embedding:
            #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
            base_url: <new endpoint for NeMO embedding>  #UPDATE
         reranker:
            #base_url: http://nemo-rerank-ranking-deployment-ranking-service:8000/v1
            base_url: <new endpoint for NeMO reranker>  #UPDATE
      
    • Update base_url of llama model under guardrail config section as follows.

      ## guardrails_config.yaml
      
      models:
      - engine: nim_patch
         model: nvidia/llama-3.2-nv-embedqa-1b-v2
         parameters:
            #base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
            base_url: <new endpoint for NeMO embedding> #UPDATE
      
  3. Redeploy Helm Chart

Once the edits are done, you can retar the nvidia-blueprint-vss folder.

tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss

Then follow the deployment section in the quickstart guide to launch the modified helm chart.

Using NIMs from build.nvidia.com#

By default, VSS deploys all the dependent NIMs as part of the blueprint. In case, you want to use NIMs from build.nvidia.com, you would need to generate an NVIDIA Personal Key using the following steps:

  1. Log in to https://build.nvidia.com/explore/discover.

  2. Navigate to any NIM e.g. https://build.nvidia.com/meta/llama3-70b.

  3. Search for Get API Key on the page and click on it.

    Get NVIDIA API Key
  4. Click on Generate Key.

    Generate API Key
  5. Store the generated API Key securely for future use.

  6. Install the NVIDIA Personal API Key as a k8s secret.

sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>

  1. Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the helm chart. Use base_url: https://integrate.api.nvidia.com/v1 for the NIMs to be used as remote.

  2. Copy the example overrides file from Configuration Options.

  3. Add the NVIDIA_API_KEY to the overrides.yaml file.

    vss:
     applicationSpecs:
       vss-deployment:
         containers:
           vss:
             env:
             ...
             - name: NVIDIA_API_KEY
               valueFrom:
                 secretKeyRef:
                   name: nvidia-api-key-secret
                   key: NVIDIA_API_KEY
    
  4. Follow the steps in Configuration Options to install the blueprint with the overrides.