Managing NeMo Guardrails#

About NeMo Guardrails#

NVIDIA NeMo Guardrails microservice enables adding programmable guardrails to LLM endpoints. NeMo Guardrails sits between your application code and the LLM to provide a way for you to adjust user prompts before sending them to the LLM and LLM responses before they are given to users.

When you deploy a NeMo Guardrails microservice, the NIM Operator creates a Deployment and Service endpoint for the NeMo Guardrails. Guardrails configurations are stored in a PVC directory and mounted into the Guardrails Deployment.

Read the NeMo Guardrails documentation for details on using guardrails.

Prerequisites#

  • All the common NeMo microservice prerequisites.

  • A NIM endpoint where your models are hosted. NIM endpoints must support the OpenAI spec and can be deployed as,

    • A NIM Cache and NIM Service locally on your cluster.

    • A NIM Proxy. Refer to the NeMo microservices documentation for details on deploying a NIM Proxy. Note that NIM Operator does not support NIM Proxy with multiple NIMs

    • A hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described more in the Kubernetes prerequisites below.

Kubernetes

  • A persistent volume provisioner that uses network storage (such as NFS, S3, or vSAN) to hold NeMo Guardrails configuration files.

    You can create a PVC and specify the name in the configuration file when you create the NeMo Guardrails resource, or you can request that the Operator creates a PVC.

  • If you plan to use a NIM hosted, create a secret containing your API Key for https://build.nvidia.com or OpenAI.

    Create a secret file like the following nemo-guardrail-secret.yaml example:

    ---
    apiVersion: v1
    stringData:
      NIM_ENDPOINT_API_KEY: <API-key>
    kind: Secret
    metadata:
      name: <nim-api-key>
      namespace: nemo
    type: Opaque
    

    Apply the secret file.

    $ kubectl apply -n nemo -f nemo-guardrail-secret.yaml
    

Deploy NeMo Guardrails#

Update the following sample scripts <inputs> with values for your cluster configuration.

  1. Create a file, such as nemo-guardrail.yaml, with contents similar to the following example. If you have a NIM endpoint you’d like to use, update the NIM_ENDPOINT_URL with your NIM Service URL and port:

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NemoGuardrail
    metadata:
      name: nemoguardrails-sample
      namespace: nemo
    spec:
      configStore:
        pvc:
          name: "pvc-guardrail-config"
          create: true
          storageClass: "<storage-class>"
          volumeAccessMode: ReadWriteMany
          size: "1Gi"
      nimEndpoint:
        baseURL: "<https://integrate.api.nvidia.com/v1>"
        #Required if you are using a hosted NIM endpoint. Create a secret with your API key.
        apiKeySecret: "<nim-api-key>"
      expose:
        service:
          type: ClusterIP
          port: 8000
      image:
        repository: nvcr.io/nvidia/nemo-microservices/guardrails
        tag: "25.04"
        pullPolicy: IfNotPresent
        pullSecrets:
          - ngc-secret
      metrics:
        serviceMonitor: {}
      replicas: 1
      resources:
        limits:
          cpu: "1"
          ephemeral-storage: 10Gi
    
  2. Apply the manifest:

    $ kubectl apply -n nemo -f nemo-guardrail.yaml
    
  3. Optional: View information about the NeMo Guardrails services:

    $ kubectl describe nemoguardrails.apps.nvidia.com -n nemo
    

    Partial Output

    ...
    Conditions:
     Last Transition Time:  2024-08-12T19:09:43Z
     Message:               Deployment is ready
     Reason:                Ready
     Status:                True
     Type:                  Ready
     Last Transition Time:  2024-08-12T19:09:43Z
     Message:
     Reason:                Ready
     Status:                False
     Type:                  Failed
    State:                  Ready
    

You now have a NeMo Guardrails microservice deployed to your cluster.

This sample NeMo Guardrail deploys the microservice with an empty configuration store. Before using NeMo Guardrails on a model, the configuration store in the NeMo Guardrail documentation. From there you can also learn how to create a configuration and update the configuration store.

Verify NeMo Guardrails#

Once you have a NeMo Guardrails deployed on your cluster, use the steps below to verify the service is up and running.

  1. Start a pod that has access to the curl command. Substitute any pod that has this command and meets your organization’s security requirements.

    $ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
    

    After the pod starts, you are connected to the ash shell in the pod.

  2. Connect to the NeMo Evaluator service

    $ curl -X GET "http://nemoguardrails-sample.nemo:8000/v1/guardrail/configs"
    

    Example Output

    {"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~ 
    
  3. Press Ctrl+D to exit and delete the pod.

Configuration Reference#

The following table shows information about the commonly modified fields for the NeMo Guardrails custom resource.

Field

Description

Default Value

spec.annotations

Specifies to add the user-supplied annotations to the pod.

None

spec.configStore.configMap

Specifies the NeMo Guardrails configuration location as a configMap. Before deploying the NeMo Guardrail service, create a ConfigMap with your guardrail configurations, then pass the name as spec.configStore.configMap.name: The name field is required when using spec.configStore.configMap

false

spec.configStore.pvc.create

When set to true, the Operator creates the PVC for you. If you delete a NeMo Guardrail resource and this field was set to true, the Operator also deletes the PVC.

Refer to the NeMo Guardrails Configuration Store documentation for more details. If you deploy a NeMo Microservice with an empty configuration store, you must update the configuration with a valid configuration before you start running guardrails.

false

spec.configStore.pvc.name

Specifies the name for the PVC.

None

spec.configStore.pvc.size

Specifies the size, in Gi, for the PVC to create.

This field is required if you specify create: true.

None

spec.configStore.pvc.storageClass

Specifies the StorageClass for the PVC to create. Leave empty if you have create set to false and you already created the PVC.

None

spec.configStore.pvc.subPath

Specifies to create a subpath on the PVC and cache the model profiles in the directory. The default subpath is guardrails-config-store.

guardrails-config-store

spec.configStore.pvc.volumeAccessMode

Specifies the access mode for the PVC to create.

None

spec.expose.ingress.enabled

When set to true, the Operator creates a Kubernetes Ingress resource for the NeMo Guardrails. Specify the ingress specification in the spec.expose.ingress.spec field.

If you have an ingress controller, values like the following sample configures an ingress for the v1/chat/completions endpoint.

ingress:
  enabled: true
  spec:
    ingressClassName: nginx
    host: demo.nvidia.example.com
    paths:
      - path: /v1/chat/completions
        pathType: Prefix

false

spec.expose.service.port

Specifies the network port number for the NeMo Guardrails microservice.

8000

spec.expose.service.type

Specifies the Kubernetes service type to create for the NeMo microservice.

ClusterIP

spec.groupID

Specifies the group for the pods. This value is used to set the security context of the pod in the runAsGroup and fsGroup fields.

2000

spec.image (required)

Specifies repository, tag, pull policy, and pull secret for the container image.

None

spec.labels

Specifies the user-supplied labels to add to the pod.

None

spec.metrics.enabled

When set to true, the Operator configures a Prometheus service monitor for the service. Specify the service monitor specification in the spec.metrics.serviceMonitor field. Refer to the Observability page for more details.

false

spec.nimEndpoint.apiKeyKey

Specifies key in the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. Defaults is NIM_ENDPOINT_API_KEY.

NIM_ENDPOINT_API_KEY

spec.nimEndpoint.apiKeySecret

Specifies the name of the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. This is required if the base URL is for a NIM proxy. Generate your API key from the Settings>API Keys page on https://build.nvidia.com.

None

spec.nimEndpoint.baseURL (required)

Specifies the base URL of the service where your NIM is hosted. This is required if you include spec.NIMEndpoint. NIM endpoints must support the OpenAI spec and can be deployed

  • locally on your cluster, as a NIM Cache and NIM Service,

  • as a NIM Proxy

  • as a hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described more in the Kubernetes prerequisites below.

The default base URL for NVIDIA hosted models is https://integrate.api.nvidia.com/v1. View a list of vailable models from https://integrate.api.nvidia.com/v1/models The base URL for a locally hosted NIM could be something like http:<NIM_SERVICE>.<NAMESPACE>:<PORT> or your configured endpoint URL.

When using a hosted model you must set spec.nimEndpoint.apiKeySecret.

None

spec.replicas

Specifies the number of replicas to have on the cluster.

None

spec.resources.requests

Specifies the memory and CPU requests.

None

spec.resources.limits

Specifies the memory and CPU limits.

None

spec.tolerations

Specifies the tolerations for the pods.

None

spec.userID

Specifies the user ID for the pod. This value is used to set the security context of the pod in the runAsUser fields.

1000

Next Steps#