Managing NeMo Guardrails#

About NeMo Guardrails#

NVIDIA NeMo Guardrails microservice enables adding programmable guardrails to LLM endpoints. NeMo Guardrails sits between your application code and the LLM to provide a way for you to adjust user prompts before sending them to the LLM and LLM responses before they are given to users.

When you deploy a NeMo Guardrails microservice, the NIM Operator creates a Deployment and Service endpoint for the NeMo Guardrails. Guardrails configurations are stored in a PVC directory and mounted into the Guardrails Deployment.

Read the NeMo Guardrails documentation for details on using guardrails.

Prerequisites#

All the common NeMo microservice prerequisites.
A NIM endpoint where your models are hosted. NIM endpoints must support the OpenAI spec and can be deployed as,
- A NIM Cache and NIM Service locally on your cluster.
- A NIM Proxy. Refer to the NeMo microservices documentation for details on deploying a NIM Proxy. Note that NIM Operator does not support NIM Proxy with multiple NIMs
- A hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described more in the Kubernetes prerequisites below.

Kubernetes

A persistent volume provisioner that uses network storage (such as NFS, S3, or vSAN) to hold NeMo Guardrails configuration files.

You can create a PVC and specify the name in the configuration file when you create the NeMo Guardrails resource, or you can request that the Operator creates a PVC.

If you plan to use a NIM hosted, create a secret containing your API Key for https://build.nvidia.com or OpenAI.

Create a secret file like the following nemo-guardrail-secret.yaml example:

---
apiVersion: v1
stringData:
  NIM_ENDPOINT_API_KEY: <API-key>
kind: Secret
metadata:
  name: <nim-api-key>
  namespace: nemo
type: Opaque

Apply the secret file.

$ kubectl apply -n nemo -f nemo-guardrail-secret.yaml

Deploy NeMo Guardrails#

Update the following sample scripts <inputs> with values for your cluster configuration.

Create a file, such as nemo-guardrail.yaml, with contents similar to the following example. If you have a NIM endpoint you’d like to use, update the NIM_ENDPOINT_URL with your NIM Service URL and port:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoGuardrail
metadata:
  name: nemoguardrails-sample
  namespace: nemo
spec:
  configStore:
    pvc:
      name: "pvc-guardrail-config"
      create: true
      storageClass: "<storage-class>"
      volumeAccessMode: ReadWriteMany
      size: "1Gi"
  nimEndpoint:
    baseURL: "<https://integrate.api.nvidia.com/v1>"
    #Required if you are using a hosted NIM endpoint. Create a secret with your API key.
    apiKeySecret: "<nim-api-key>"
  expose:
    service:
      type: ClusterIP
      port: 8000
  image:
    repository: nvcr.io/nvidia/nemo-microservices/guardrails
    tag: "25.06"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  metrics:
    serviceMonitor: {}
  replicas: 1
  resources:
    limits:
      cpu: "1"
      ephemeral-storage: 10Gi

Apply the manifest:

$ kubectl apply -n nemo -f nemo-guardrail.yaml

Optional: View information about the NeMo Guardrails services:

$ kubectl describe nemoguardrails.apps.nvidia.com -n nemo

Partial Output

...
Conditions:
 Last Transition Time:  2024-08-12T19:09:43Z
 Message:               Deployment is ready
 Reason:                Ready
 Status:                True
 Type:                  Ready
 Last Transition Time:  2024-08-12T19:09:43Z
 Message:
 Reason:                Ready
 Status:                False
 Type:                  Failed
State:                  Ready

You now have a NeMo Guardrails microservice deployed to your cluster.

This sample NeMo Guardrail deploys the microservice with an empty configuration store. Before using NeMo Guardrails on a model, the configuration store in the NeMo Guardrail documentation. From there you can also learn how to create a configuration and update the configuration store.

Verify NeMo Guardrails#

Once you have a NeMo Guardrails deployed on your cluster, use the steps below to verify the service is up and running.

Start a pod that has access to the curl command. Substitute any pod that has this command and meets your organization’s security requirements.
```
$ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
```
After the pod starts, you are connected to the ash shell in the pod.

Connect to the NeMo Evaluator service

$ curl -X GET "http://nemoguardrails-sample.nemo:8000/v1/guardrail/configs"

Example Output

{"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~ 

Press Ctrl+D to exit and delete the pod.

Configuration Reference#

The following table shows information about the commonly modified fields for the NeMo Guardrails custom resource.

Field	Description	Default Value
`spec.annotations`	Specifies to add the user-supplied annotations to the pod.	None
`spec.configStore.configMap`	Specifies the NeMo Guardrails configuration location as a configMap. Before deploying the NeMo Guardrail service, create a ConfigMap with your guardrail configurations, then pass the name as spec.configStore.configMap.name: The `name` field is required when using `spec.configStore.configMap`	`false`
`spec.configStore.pvc.create`	When set to `true`, the Operator creates the PVC for you. If you delete a NeMo Guardrail resource and this field was set to `true`, the Operator also deletes the PVC. Refer to the NeMo Guardrails Configuration Store documentation for more details. If you deploy a NeMo Microservice with an empty configuration store, you must update the configuration with a valid configuration before you start running guardrails.	`false`
`spec.configStore.pvc.name`	Specifies the name for the PVC.	None
`spec.configStore.pvc.size`	Specifies the size, in Gi, for the PVC to create. This field is required if you specify `create: true`.	None
`spec.configStore.pvc.storageClass`	Specifies the StorageClass for the PVC to create. Leave empty if you have `create` set to `false` and you already created the PVC.	None
`spec.configStore.pvc.subPath`	Specifies to create a subpath on the PVC and cache the model profiles in the directory. The default subpath is `guardrails-config-store`.	`guardrails-config-store`
`spec.configStore.pvc.volumeAccessMode`	Specifies the access mode for the PVC to create.	None
`spec.expose.ingress.enabled`	When set to `true`, the Operator creates a Kubernetes Ingress resource for the NeMo Guardrails. Specify the ingress specification in the `spec.expose.ingress.spec` field. If you have an ingress controller, values like the following sample configures an ingress for the `v1/chat/completions` endpoint. ingress: enabled: true spec: ingressClassName: nginx host: demo.nvidia.example.com paths: - path: /v1/chat/completions pathType: Prefix	`false`
`spec.expose.service.port`	Specifies the network port number for the NeMo Guardrails microservice.	`8000`
`spec.expose.service.type`	Specifies the Kubernetes service type to create for the NeMo microservice.	`ClusterIP`
`spec.groupID`	Specifies the group for the pods. This value is used to set the security context of the pod in the `runAsGroup` and `fsGroup` fields.	`2000`
`spec.image` (required)	Specifies repository, tag, pull policy, and pull secret for the container image.	None
`spec.labels`	Specifies the user-supplied labels to add to the pod.	None
`spec.metrics.enabled`	When set to `true`, the Operator configures a Prometheus service monitor for the service. Specify the service monitor specification in the `spec.metrics.serviceMonitor` field. Refer to the Observability page for more details.	`false`
`spec.nimEndpoint.apiKeyKey`	Specifies key in the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. Defaults is `NIM_ENDPOINT_API_KEY`.	`NIM_ENDPOINT_API_KEY`
`spec.nimEndpoint.apiKeySecret`	Specifies the name of the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. This is required if the base URL is for a NIM proxy. Generate your API key from the `Settings`>`API Keys` page on https://build.nvidia.com.	`None`
`spec.nimEndpoint.baseURL` (required)	Specifies the base URL of the service where your NIM is hosted. This is required if you include `spec.NIMEndpoint`. NIM endpoints must support the OpenAI spec and can be deployed locally on your cluster, as a NIM Cache and NIM Service, as a NIM Proxy as a hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described more in the Kubernetes prerequisites below. The default base URL for NVIDIA hosted models is `https://integrate.api.nvidia.com/v1`. View a list of vailable models from https://integrate.api.nvidia.com/v1/models The base URL for a locally hosted NIM could be something like `http:<NIM_SERVICE>.<NAMESPACE>:<PORT>` or your configured endpoint URL. When using a hosted model you must set `spec.nimEndpoint.apiKeySecret`.	`None`
`spec.replicas`	Specifies the number of replicas to have on the cluster.	None
`spec.resources.requests`	Specifies the memory and CPU requests.	None
`spec.resources.limits`	Specifies the memory and CPU limits.	None
`spec.tolerations`	Specifies the tolerations for the pods.	None
`spec.userID`	Specifies the user ID for the pod. This value is used to set the security context of the pod in the `runAsUser` fields.	`1000`

Next Steps#

Managing NeMo Data Stores
Managing NeMo Customizer
Refer to the NeMo Guardrails documentation for details on