Using External Endpoints#

You may want to use a external endpoint and therefore do not need to deploy specific resources and must remove the dependency within the helm chart. Follow the steps mentioned below and update the helm chart before deployment.

Remote LLM Endpoint#

Here we show an example to use external Llama endpoint as the LLM.

  1. Download the script: override_remote_endpoints.sh and export the necessary environment variables.

  2. For updating the LLM endpoint, export the variables NGC_API_KEY, CHART_NAME, HELM_URL, LLM_BASE_URL and LLM_MODEL.

  3. Run the script

    chmod +x ./override_remote_endpoints.sh
    ./override_remote_endpoints.sh
    
    • Examples:

      • Using GPT-4o model

        This requires to setup the OPENAI_API_KEY as shown Configuring for GPT-4o.

        export NGC_API_KEY=<your_ngc_api_key>
        export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz
        export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/
        export LLM_BASE_URL=https://api.openai.com/v1
        export LLM_MODEL=gpt-4o
        chmod +x ./override_remote_endpoints.sh
        ./override_remote_endpoints.sh
        
      • Using DeepSeek R1 model

        Configure the LLM NIM to deploy the DeepSeek model. Refer to Configure the NIMs for more details.

        export NGC_API_KEY=<your_ngc_api_key>
        export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz
        export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/
        export LLM_BASE_URL=https://integrate.api.nvidia.com/v1
        export LLM_MODEL=deepseek-ai/deepseek-r1
        chmod +x ./override_remote_endpoints.sh
        ./override_remote_endpoints.sh
        

An overrides.yaml file will be generated in the same directory. Follow the steps in Configuration Options to install the blueprint with the overrides.

Remote NeMo Rerank and Embedding Endpoint#

The steps are similar to using an remote LLM endpoint.

  1. Download the script: override_remote_endpoints.sh and export the necessary environment variables.

  2. For updating the Nemo Rerank Endpoint, export the variables NGC_API_KEY, CHART_NAME, HELM_URL, RERANKER_URL

  3. For updating the Embedding Endpoint, export the variables NGC_API_KEY, CHART_NAME, HELM_URL, EMBEDDING_URL

  4. Run the script

    export NGC_API_KEY=<your_ngc_api_key>
    export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz
    export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/
    export EMBEDDING_URL=<url_for_remote_embedding_endpoint>
    export RERANKER_URL=<url_for_remote_reranking_endpoint>
    chmod +x ./override_remote_endpoints.sh
    ./override_remote_endpoints.sh
    

An overrides.yaml file will be generated in the same directory. Follow the steps in Configuration Options to install the blueprint with the overrides.

Using NIMs from build.nvidia.com#

By default, VSS deploys all the dependent NIMs as part of the blueprint. In case, you want to use NIMs from build.nvidia.com, you would need to generate an NVIDIA Personal Key using the following steps:

  1. Log in to https://build.nvidia.com/explore/discover.

  2. Navigate to any NIM e.g. https://build.nvidia.com/meta/llama3-70b.

  3. Search for Get API Key on the page and click on it.

    Get NVIDIA API Key
  4. Click on Generate Key.

    Generate API Key
  5. Store the generated API Key securely for future use.

  6. Install the NVIDIA Personal API Key as a k8s secret.

sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>

  1. Follow the steps in Remote LLM Endpoint and/or Remote NeMo Rerank and Embedding Endpoint to update the helm chart. Use base_url: https://integrate.api.nvidia.com/v1 for embedding and llm and base_url: https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking for reranker.

  2. Copy the example overrides file from Configuration Options.

  3. Add the NVIDIA_API_KEY to the overrides.yaml file.

    vss:
     applicationSpecs:
       vss-deployment:
         containers:
           vss:
             env:
             ...
             - name: NVIDIA_API_KEY
               valueFrom:
                 secretKeyRef:
                   name: nvidia-api-key-secret
                   key: NVIDIA_API_KEY
    
  4. Follow the steps in Configuration Options to install the blueprint with the overrides.

Using Riva ASR as a remote service#

The audio transcription feature in VSS can be enabled using remote RIVA ASR microservice, instead of having Riva ASR NIM as part of the VSS blueprint.

  1. Download the script: override_remote_endpoints.sh.

  2. Set the following env variables to override the default values and run the above script: NGC_API_KEY, CHART_NAME, HELM_URL, RIVA_ASR_SERVER_URI, RIVA_ASR_GRPC_PORT, RIVA_ASR_SERVER_USE_SSL, RIVA_ASR_SERVER_IS_NIM, RIVA_ASR_MODEL_NAME

  • Example:

    export NGC_API_KEY=<your_ngc_api_key>
    export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz             #Specify the chart name
    export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/  #Specify the helm repository URL
    export RIVA_ASR_SERVER_URI=<Riva ASR server URI>  #Specify the Riva ASR server URI e.g "10.10.10.10"
    export RIVA_ASR_GRPC_PORT=<Riva ASR gRPC port>     #Specify the Riva ASR gRPC port eg: 50051
    export RIVA_ASR_SERVER_USE_SSL=<true/false>       #Specify if Riva ASR should use SSL
    export RIVA_ASR_SERVER_IS_NIM=<true/false> #Specify if Riva ASR server is NIM
    export RIVA_ASR_MODEL_NAME=<"Riva ASR model name">    #Specify the Riva ASR model name
    chmod +x ./override_remote_endpoints.sh
    ./override_remote_endpoints.sh
    
  1. This should generate an overrides.yaml file in the same directory.

Follow the steps in Configuration Options to install the blueprint with the overrides.

Using Riva ASR NIM from build.nvidia.com#

The audio transcription feature in VSS can be enabled using remote riva ASR microservice from build.nvidia.com, instead of having Riva ASR NIM as part of the VSS blueprint. You would need to generate an API key and update the helm chart as follows:

  1. Get the NVIDIA Personal key and create kubernetes secret as shown in Using NIMs from build.nvidia.com. and assign it to NVIDIA_API_KEY

  2. Get the Function ID for the Riva ASR NIM from the Riva ASR NIM API page. For example for https://build.nvidia.com/nvidia/parakeet-ctc-0_6b-asr/api:

    Parakeet Function ID
  3. Download the script: override_remote_endpoints.sh.

  4. Set the following env variables to override the default values and run the above script: NGC_API_KEY, CHART_NAME, HELM_URL, RIVA_ASR_SERVER_URI, RIVA_ASR_GRPC_PORT, RIVA_ASR_SERVER_USE_SSL, RIVA_ASR_SERVER_FUNC_ID

    • Example:

      export NVIDIA_API_KEY=<your_nvidia_personal_key>
      sudo microk8s kubectl create secret generic nvidia-api-key-secret --from-literal=NVIDIA_API_KEY=$NVIDIA_API_KEY
      export NGC_API_KEY=<your_ngc_api_key>
      export CHART_NAME=nvidia-blueprint-vss-2.3.0.tgz             #Specify the chart name
      export HELM_URL=https://helm.ngc.nvidia.com/nvidia/blueprint/charts/  #Specify the helm repository URL
      export RIVA_ASR_SERVER_URI="grpc.nvcf.nvidia.com"    #Specify the Riva ASR server URL
      export RIVA_ASR_GRPC_PORT="443"     #Specify the Riva ASR gRPC port
      export RIVA_ASR_SERVER_USE_SSL="true"       #Specify if Riva ASR should use SSL
      export RIVA_ASR_SERVER_FUNC_ID=<Function ID from the Riva ASR NIM API page> # Eg: e6fa172c-79bf-4b9c-bb37-14fe17b4226c
      chmod +x ./override_remote_endpoints.sh
      ./override_remote_endpoints.sh
      
  5. This should generate an overrides.yaml file in the same directory.

  6. Follow the steps in Configuration Options to install the blueprint with the overrides.