PolicyAI Integration

NeMo Guardrails supports using the PolicyAI content moderation API as an input and output rail out-of-the-box (you need to have the POLICYAI_API_KEY environment variable set).

PolicyAI provides flexible policy-based content moderation, allowing you to define custom policies for your specific use cases and manage them through tags.

Setup

Sign up for a PolicyAI account at musubilabs.ai
Create your policies and organize them with tags
Set the required environment variables:

$ export POLICYAI_API_KEY="your-api-key"
$ export POLICYAI_BASE_URL="https://api.musubilabs.ai"  # Optional, this is the default
$ export POLICYAI_TAG_NAME="prod"  # Optional, defaults to "prod"

Usage

Basic Input Moderation

1 rails:
2   input:
3     flows:
4       - policyai moderation on input

Basic Output Moderation

1 rails:
2   output:
3     flows:
4       - policyai moderation on output

Using Different Tags

To use different policy tags for different environments, set the POLICYAI_TAG_NAME environment variable:

$ # For staging environment
$ export POLICYAI_TAG_NAME="staging"
$ 
$ # For production environment
$ export POLICYAI_TAG_NAME="prod"

Complete Example

1 models:
2   - type: main
3     engine: openai
4     model: gpt-4
5 
6 rails:
7   input:
8     flows:
9       - policyai moderation on input
10 
11   output:
12     flows:
13       - policyai moderation on output

How It Works

Input Rails: When a user sends a message, PolicyAI evaluates it against all policies attached to the configured tag. If any policy returns UNSAFE, the message is blocked.
Output Rails: Before the bot’s response is sent to the user, PolicyAI evaluates it. If the content violates any policy, the response is replaced with a refusal message.

Response Format

PolicyAI returns the following information for each evaluation:

assessment: "SAFE" or "UNSAFE"
category: The category of violation (if UNSAFE)
severity: Severity level from 0 (safe) to 3 (high severity)
reason: Human-readable explanation

Customizing Behavior

To customize the behavior when content is flagged, you can override the default flows in your config:

define subflow policyai moderation on input
  """Custom PolicyAI input moderation."""
  $result = execute call_policyai_api(text=$user_message)
  if $result.assessment == "UNSAFE"
    bot inform content policy violation
    stop
define bot inform content policy violation
  "I'm sorry, but I cannot process that request. Please rephrase your message."

Environment Variables

Variable	Required	Default	Description
`POLICYAI_API_KEY`	Yes	-	Your PolicyAI API key
`POLICYAI_BASE_URL`	No	`https://api.musubilabs.ai`	PolicyAI API base URL
`POLICYAI_TAG_NAME`	No	`prod`	Default policy tag to use

Error Handling

If the PolicyAI API is unavailable or returns an error, the action will raise an exception. To implement fail-open or fail-closed behavior, you can wrap the action in a try-catch block in your custom flows.