Managing NeMo Guardrails#

About NeMo Guardrails#

NVIDIA NeMo Guardrails microservice enables adding programmable guardrails to LLM endpoints. NeMo Guardrails sits between your application code and the LLM. It provides a way for you to adjust user prompts before sending them to the LLM and LLM responses before they are given to users.

When you deploy a NeMo Guardrails microservice, the NIM Operator creates a Deployment and Service endpoint for NeMo Guardrails. Guardrails configurations are stored in a PVC directory and mounted into the Guardrails Deployment.

Read the NeMo Guardrails documentation for details on using guardrails.

Prerequisites#

  • All the common NeMo microservice prerequisites.

  • A NIM endpoint where your models are hosted. NIM endpoints must support the OpenAI spec and can be deployed as:

    • A NIM Cache and NIM Service locally on your cluster.

    • A NIM Proxy. Refer to the NeMo microservices documentation for details on deploying a NIM Proxy. Note that NIM Operator does not support NIM Proxy with multiple NIM endpoints.

    • A hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described in the following Kubernetes prerequisites section.

  • Optional: OpenTelemetry Collector installed on your cluster for observability and distributed tracing. Read the OpenTelemetry documentation for details on installing OpenTelemetry Collector with Helm. Also refer to Configure Observability with OpenTelemetry for details on using OpenTelemetry with this microservice.

    Note

    You can use the NeMo Dependencies Ansible Playbook to deploy NeMo Guardrail OpenTelemetry Collector.

Storage

  • A PostgreSQL database installed. This database is used as a persistent data store for the NeMo Guardrail configurations.

Kubernetes

  • Create a database user secret by creating a file such as guardrail-pg-secret.yaml, with contents like the following example:

    apiVersion: v1
    stringData:
      password: <guardrailpass>
    kind: Secret
    metadata:
      name: guardrail-pg-existing-secret
      namespace: nemo
    type: Opaque
    

    Apply the secret file.

    $ kubectl apply -n nemo-guardrail -f guardrail-pg-secret.yaml
    
  • If you plan to use a hosted NIM, create a secret containing your API Key for https://build.nvidia.com or OpenAI.

    Create a secret file like the following nemo-guardrail-secret.yaml example:

    ---
    apiVersion: v1
    stringData:
      NIM_ENDPOINT_API_KEY: <API-key>
    kind: Secret
    metadata:
      name: <nim-api-key>
      namespace: nemo
    type: Opaque
    

    Apply the secret file.

    $ kubectl apply -n nemo -f nemo-guardrail-secret.yaml
    

Deploy NeMo Guardrails#

Update the following sample scripts <inputs> with values for your cluster configuration.

  1. Create a file, such as nemo-guardrail.yaml, with contents similar to the following example. If you have a NIM endpoint you would like to use, update the NIM_ENDPOINT_URL with your NIM Service URL and port:

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NemoGuardrail
    metadata:
      name: nemoguardrails-sample
      namespace: nemo
    spec:
      configStore:
        pvc:
          name: "pvc-guardrail-config"
          create: true
          storageClass: "<storage-class>"
          volumeAccessMode: ReadWriteMany
          size: "1Gi"
      nimEndpoint:
        baseURL: "<https://integrate.api.nvidia.com/v1>"
        #Required if you are using a hosted NIM endpoint. Create a secret with your API key.
        apiKeySecret: "<nim-api-key>"
      expose:
        service:
          type: ClusterIP
          port: 8000
      image:
        repository: nvcr.io/nvidia/nemo-microservices/guardrails
        tag: "25.10"
        pullPolicy: IfNotPresent
        pullSecrets:
          - ngc-secret
      metrics:
        serviceMonitor: {}
      replicas: 1
      resources:
        limits:
          cpu: "1"
          ephemeral-storage: 10Gi
      # # Optional: OpenTelemetry tracing configuration
      # otel:  
      #   enabled: true
      #   exporterOtlpEndpoint: http://<guardrail-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317
      #   exporterConfig:
      #     tracesExporter: otlp
      #     metricsExporter: otlp
      #     logsExporter: otlp
      #   logLevel: INFO
      #   excludedUrls:
      #     - health
      # # Required environment variables for OTEL
      # env:
      #   - name: OTEL_EXPORTER_OTLP_PROTOCOL
      #     value: grpc
      #   - name: OTEL_EXPORTER_OTLP_INSECURE
      #     value: "true"
    
  2. Apply the manifest:

    $ kubectl apply -n nemo -f nemo-guardrail.yaml
    
  3. Optional: View information about the NeMo Guardrails services:

    $ kubectl describe nemoguardrails.apps.nvidia.com -n nemo
    

    Partial Output

    ...
    Conditions:
     Last Transition Time:  2024-08-12T19:09:43Z
     Message:               Deployment is ready
     Reason:                Ready
     Status:                True
     Type:                  Ready
     Last Transition Time:  2024-08-12T19:09:43Z
     Message:
     Reason:                Ready
     Status:                False
     Type:                  Failed
    State:                  Ready
    

You now have a NeMo Guardrails microservice deployed to your cluster.

This sample NeMo Guardrail deploys the microservice with an empty configuration store. Before using NeMo Guardrails on a model, you must create a guardrail configuration. Refer to the create configuration guide in the NeMo Guardrail documentation. From there you can also learn how to create a configuration and update the configuration store.

Also refer to Configure Observability with OpenTelemetry for more details on NeMo Guardrails observability.

Verify NeMo Guardrails#

After you have a NeMo Guardrails deployed on your cluster, use the following steps to verify the service is up and running.

  1. Start a pod that has access to the curl command. Substitute any pod that has this command and meets your organization’s security requirements.

    $ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
    

    After the pod starts, you are connected to the ash shell in the pod.

  2. Connect to the NeMo Evaluator service.

    $ curl -X GET "http://nemoguardrails-sample.nemo:8000/v1/guardrail/configs"
    

    Example Output

    {"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~ 
    
  3. Press Ctrl+D to exit and delete the pod.

Configure Observability with OpenTelemetry#

NeMo Guardrails supports using OpenTelemetry for observability. Refer to the NeMo Guardrails Observability documentation for more details.

To enable event tracing:

  1. Deploy the OpenTelemetry Collector to your cluster. Refer to the Prerequisites section for more details.

  2. Enable OpenTelemetry in your NeMo Guardrail deployment.

    # Optional: OpenTelemetry tracing configuration
    otel:  
      enabled: true
      exporterOtlpEndpoint: http://<guardrail-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317
      exporterConfig:
        tracesExporter: otlp
        metricsExporter: otlp
        logsExporter: otlp
      logLevel: INFO
      excludedUrls:
        - health
    # Required environment variables for OTEL
    env:
      - name: OTEL_EXPORTER_OTLP_PROTOCOL
        value: grpc
      - name: OTEL_EXPORTER_OTLP_INSECURE
        value: "true"
    
  3. Enable tracing in your guardrail configuration.

    "tracing": {
        "enabled": "True",
        "adapters": [
            {
                "name": "OpenTelemetry"
            }
        ]
    }
    
  4. Optional. Verify tracing using the example in the NeMo Guardrail verifying tracing integration documentation.

Configuration Reference#

The following table shows information about the commonly modified fields for the NeMo Guardrails custom resource.

Field

Description

Default Value

spec.annotations

Specifies to add the user-supplied annotations to the pod.

None

spec.configStore.configMap

Specifies the NeMo Guardrails configuration location as a configMap. Before deploying the NeMo Guardrail service, create a ConfigMap with your guardrail configurations, then pass the name as spec.configStore.configMap.name: The name field is required when using spec.configStore.configMap

false

spec.configStore.pvc.create

When set to true, the Operator creates the PVC for you. If you delete a NeMo Guardrail resource and this field was set to true, the Operator also deletes the PVC.

Refer to the NeMo Guardrails Configuration Store documentation for more details. If you deploy a NeMo Microservice with an empty configuration store, you must update the configuration with a valid configuration before you start running guardrails.

false

spec.configStore.pvc.name

Specifies the name for the PVC.

None

spec.configStore.pvc.size

Specifies the size, in Gi, for the PVC to create.

This field is required if you specify create: true.

None

spec.configStore.pvc.storageClass

Specifies the StorageClass for the PVC to create. Leave empty if you have create set to false and you already created the PVC.

None

spec.configStore.pvc.subPath

Specifies to create a subpath on the PVC and cache the model profiles in the directory. The default subpath is guardrails-config-store.

guardrails-config-store

spec.configStore.pvc.volumeAccessMode

Specifies the access mode for the PVC to create.

None

spec.env

Specifies additional environment variables to add to the container. When using OpenTelemetry, you must include OTEL_EXPORTER_OTLP_PROTOCOL set to grpc and OTEL_EXPORTER_OTLP_INSECURE set to "true".

None

spec.expose.ingress.enabled

When set to true, the Operator creates a Kubernetes Ingress resource for the NeMo Guardrails. Specify the ingress specification in the spec.expose.ingress.spec field.

If you have an ingress controller, values like the following sample configures an ingress for the v1/chat/completions endpoint.

ingress:
  enabled: true
  spec:
    ingressClassName: nginx
    host: demo.nvidia.example.com
    paths:
      - path: /v1/chat/completions
        pathType: Prefix

false

spec.expose.service.port

Specifies the network port number for the NeMo Guardrails microservice.

8000

spec.expose.service.type

Specifies the Kubernetes service type to create for the NeMo microservice.

ClusterIP

spec.groupID

Specifies the group for the pods. This value is used to set the security context of the pod in the runAsGroup and fsGroup fields.

2000

spec.image (required)

Specifies repository, tag, pull policy, and pull secret for the container image.

None

spec.labels

Specifies the user-supplied labels to add to the pod.

None

spec.metrics.enabled

When set to true, the Operator configures a Prometheus service monitor for the service. Specify the service monitor specification in the spec.metrics.serviceMonitor field. Refer to the Observability page for more details.

false

spec.nimEndpoint.apiKeyKey

Specifies key in the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. Default is NIM_ENDPOINT_API_KEY.

NIM_ENDPOINT_API_KEY

spec.nimEndpoint.apiKeySecret

Specifies the name of the secret that contains the API key for accessing NVIDIA Host models from https://build.nvidia.com. This is required if the base URL is for a NIM proxy. Generate your API key from the Settings > API Keys page on https://build.nvidia.com.

None

spec.nimEndpoint.baseURL (required)

Specifies the base URL of the service where your NIM is hosted. This is required if you include spec.NIMEndpoint. NIM endpoints must support the OpenAI spec and can be deployed

  • locally on your cluster, as a NIM Cache and NIM Service,

  • as a NIM Proxy

  • as a hosted model from an LLM provider. For example, a NVIDIA hosted model from https://integrate.api.nvidia.com/v1. Hosted models require an API key and secret to access the model, which is described in the following Kubernetes prerequisites section.

The default base URL for NVIDIA hosted models is https://integrate.api.nvidia.com/v1. You can view a list of available models at https://integrate.api.nvidia.com/v1/models. The base URL for a locally hosted NIM could be something like http:<NIM_SERVICE>.<NAMESPACE>:<PORT> or your configured endpoint URL.

When using a hosted model you must set spec.nimEndpoint.apiKeySecret.

None

spec.otel.disableLogging

When set to true, Python logging auto-instrumentation is disabled.

None

spec.otel.enabled

When set to true, OpenTelemetry collector and tracing are enabled.

None

spec.otel.excludedUrls

Specifies URLs to be excluded from tracing.

health

spec.otel.exporterConfig.logsExporter

Specifies the log exporter. Values include otlp, console, none.

otlp

spec.otel.exporterConfig.metricsExporter

Specifies the metrics exporter. Values include otlp, console, none.

otlp

spec.otel.exporterConfig.tracesExporter

Specifies the trace exporter. Values include otlp, console, none.

otlp

spec.otel.exporterOtlpEndpoint

Specifies the OpenTelemetry Protocol endpoint.

None

spec.otel.logLevel

Specifies the log level for OpenTelemetry. Values include INFO and DEBUG.

INFO

spec.replicas

Specifies the number of replicas to have on the cluster.

None

spec.resources.requests

Specifies the memory and CPU requests.

None

spec.resources.limits

Specifies the memory and CPU limits.

None

spec.tolerations

Specifies the tolerations for the pods.

None

spec.userID

Specifies the user ID for the pod. This value is used to set the security context of the pod in the runAsUser fields.

1000

Next Steps#