Managing NeMo Guardrails#
About NeMo Guardrails#
NVIDIA NeMo Guardrails microservice enables adding programmable guardrails to LLM endpoints. NeMo Guardrails sits between your application code and the LLM. It provides a way for you to adjust user prompts before sending them to the LLM and LLM responses before they are given to users.
When you deploy a NeMo Guardrails microservice, the NIM Operator creates a Deployment and Service endpoint for NeMo Guardrails. Guardrails configurations are stored in a PVC directory and mounted into the Guardrails Deployment.
Read the NeMo Guardrails documentation for details on using guardrails.
Prerequisites#
All the common NeMo microservice prerequisites.
A NIM endpoint where your models are hosted. NIM endpoints must support the OpenAI spec and can be deployed as:
A NIM Cache and NIM Service locally on your cluster.
A NIM Proxy. Refer to the NeMo microservices documentation for details on deploying a NIM Proxy. Note that NIM Operator does not support NIM Proxy with multiple NIM endpoints.
A hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described in the following Kubernetes prerequisites section.
Optional: OpenTelemetry Collector installed on your cluster for observability and distributed tracing. Read the OpenTelemetry documentation for details on installing OpenTelemetry Collector with Helm. Also refer to Configure Observability with OpenTelemetry for details on using OpenTelemetry with this microservice.
Note
You can use the NeMo Dependencies Ansible Playbook to deploy NeMo Guardrail OpenTelemetry Collector.
Storage
A PostgreSQL database installed. This database is used as a persistent data store for the NeMo Guardrail configurations.
Kubernetes
Create a database user secret by creating a file such as
guardrail-pg-secret.yaml, with contents like the following example:apiVersion: v1 stringData: password: <guardrailpass> kind: Secret metadata: name: guardrail-pg-existing-secret namespace: nemo type: Opaque
Apply the secret file.
$ kubectl apply -n nemo-guardrail -f guardrail-pg-secret.yaml
If you plan to use a hosted NIM, create a secret containing your API Key for https://build.nvidia.com or OpenAI.
Create a secret file like the following
nemo-guardrail-secret.yamlexample:--- apiVersion: v1 stringData: NIM_ENDPOINT_API_KEY: <API-key> kind: Secret metadata: name: <nim-api-key> namespace: nemo type: Opaque
Apply the secret file.
$ kubectl apply -n nemo -f nemo-guardrail-secret.yaml
Deploy NeMo Guardrails#
Update the following sample scripts <inputs> with values for your cluster configuration.
Create a file, such as
nemo-guardrail.yaml, with contents similar to the following example. If you have a NIM endpoint you would like to use, update theNIM_ENDPOINT_URLwith your NIM Service URL and port:apiVersion: apps.nvidia.com/v1alpha1 kind: NemoGuardrail metadata: name: nemoguardrails-sample namespace: nemo spec: configStore: pvc: name: "pvc-guardrail-config" create: true storageClass: "<storage-class>" volumeAccessMode: ReadWriteMany size: "1Gi" nimEndpoint: baseURL: "<https://integrate.api.nvidia.com/v1>" #Required if you are using a hosted NIM endpoint. Create a secret with your API key. apiKeySecret: "<nim-api-key>" expose: service: type: ClusterIP port: 8000 image: repository: nvcr.io/nvidia/nemo-microservices/guardrails tag: "25.10" pullPolicy: IfNotPresent pullSecrets: - ngc-secret metrics: serviceMonitor: {} replicas: 1 resources: limits: cpu: "1" ephemeral-storage: 10Gi # # Optional: OpenTelemetry tracing configuration # otel: # enabled: true # exporterOtlpEndpoint: http://<guardrail-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317 # exporterConfig: # tracesExporter: otlp # metricsExporter: otlp # logsExporter: otlp # logLevel: INFO # excludedUrls: # - health # # Required environment variables for OTEL # env: # - name: OTEL_EXPORTER_OTLP_PROTOCOL # value: grpc # - name: OTEL_EXPORTER_OTLP_INSECURE # value: "true"
Apply the manifest:
$ kubectl apply -n nemo -f nemo-guardrail.yaml
Optional: View information about the NeMo Guardrails services:
$ kubectl describe nemoguardrails.apps.nvidia.com -n nemo
Partial Output
... Conditions: Last Transition Time: 2024-08-12T19:09:43Z Message: Deployment is ready Reason: Ready Status: True Type: Ready Last Transition Time: 2024-08-12T19:09:43Z Message: Reason: Ready Status: False Type: Failed State: Ready
You now have a NeMo Guardrails microservice deployed to your cluster.
This sample NeMo Guardrail deploys the microservice with an empty configuration store. Before using NeMo Guardrails on a model, you must create a guardrail configuration. Refer to the create configuration guide in the NeMo Guardrail documentation. From there you can also learn how to create a configuration and update the configuration store.
Also refer to Configure Observability with OpenTelemetry for more details on NeMo Guardrails observability.
Verify NeMo Guardrails#
After you have a NeMo Guardrails deployed on your cluster, use the following steps to verify the service is up and running.
Start a pod that has access to the
curlcommand. Substitute any pod that has this command and meets your organization’s security requirements.$ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
After the pod starts, you are connected to the
ashshell in the pod.Connect to the NeMo Evaluator service.
$ curl -X GET "http://nemoguardrails-sample.nemo:8000/v1/guardrail/configs"
Example Output
{"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~Press Ctrl+D to exit and delete the pod.
Configure Observability with OpenTelemetry#
NeMo Guardrails supports using OpenTelemetry for observability. Refer to the NeMo Guardrails Observability documentation for more details.
To enable event tracing:
Deploy the OpenTelemetry Collector to your cluster. Refer to the Prerequisites section for more details.
Enable OpenTelemetry in your NeMo Guardrail deployment.
# Optional: OpenTelemetry tracing configuration otel: enabled: true exporterOtlpEndpoint: http://<guardrail-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317 exporterConfig: tracesExporter: otlp metricsExporter: otlp logsExporter: otlp logLevel: INFO excludedUrls: - health # Required environment variables for OTEL env: - name: OTEL_EXPORTER_OTLP_PROTOCOL value: grpc - name: OTEL_EXPORTER_OTLP_INSECURE value: "true"
Enable tracing in your guardrail configuration.
"tracing": { "enabled": "True", "adapters": [ { "name": "OpenTelemetry" } ] }
Optional. Verify tracing using the example in the NeMo Guardrail verifying tracing integration documentation.
Configuration Reference#
The following table shows information about the commonly modified fields for the NeMo Guardrails custom resource.
Field |
Description |
Default Value |
|---|---|---|
|
Specifies to add the user-supplied annotations to the pod. |
None |
|
Specifies the NeMo Guardrails configuration location as a configMap.
Before deploying the NeMo Guardrail service, create a ConfigMap with your guardrail configurations, then pass the name as spec.configStore.configMap.name: |
|
|
When set to Refer to the NeMo Guardrails Configuration Store documentation for more details. If you deploy a NeMo Microservice with an empty configuration store, you must update the configuration with a valid configuration before you start running guardrails. |
|
|
Specifies the name for the PVC. |
None |
|
Specifies the size, in Gi, for the PVC to create. This field is required if you specify |
None |
|
Specifies the StorageClass for the PVC to create. Leave empty if you have |
None |
|
Specifies to create a subpath on the PVC and cache the model profiles in the directory.
The default subpath is |
|
|
Specifies the access mode for the PVC to create. |
None |
|
Specifies additional environment variables to add to the container.
When using OpenTelemetry, you must include |
None |
|
When set to If you have an ingress controller, values like the following sample configures an ingress for the ingress:
enabled: true
spec:
ingressClassName: nginx
host: demo.nvidia.example.com
paths:
- path: /v1/chat/completions
pathType: Prefix
|
|
|
Specifies the network port number for the NeMo Guardrails microservice. |
|
|
Specifies the Kubernetes service type to create for the NeMo microservice. |
|
|
Specifies the group for the pods.
This value is used to set the security context of the pod in the |
|
|
Specifies repository, tag, pull policy, and pull secret for the container image. |
None |
|
Specifies the user-supplied labels to add to the pod. |
None |
|
When set to |
|
|
Specifies key in the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com.
Default is |
|
|
Specifies the name of the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. This is required if the base URL is for a NIM proxy. Generate your API key from the Settings > API Keys page on https://build.nvidia.com. |
None |
|
Specifies the base URL of the service where your NIM is hosted.
This is required if you include
The default base URL for NVIDIA hosted models is When using a hosted model you must set |
None |
|
When set to |
None |
|
When set to |
None |
|
Specifies URLs to be excluded from tracing. |
|
|
Specifies the log exporter. Values include |
|
|
Specifies the metrics exporter. Values include |
|
|
Specifies the trace exporter. Values include |
|
|
Specifies the OpenTelemetry Protocol endpoint. |
None |
|
Specifies the log level for OpenTelemetry. Values include |
|
|
Specifies the number of replicas to have on the cluster. |
None |
|
Specifies the memory and CPU requests. |
None |
|
Specifies the memory and CPU limits. |
None |
|
Specifies the tolerations for the pods. |
None |
|
Specifies the user ID for the pod.
This value is used to set the security context of the pod in the |
|
Next Steps#
Refer to the NeMo Guardrails documentation for details on