Configuration Structure#

A guardrail configuration contains several properties that customize how the service interacts with models and applies safety checks. This page describes the core components (models, prompts, and rails) as well as advanced options for tracing, passthrough, and other behaviors.


Models#

A model configuration defines the LLM to use for a specific task. It consists of the following fields:

  • type: The task the model is used for.

  • engine: The model provider. For most cases, use nim.

  • model: The name of the model to use for the task. For the main model, an incoming Guardrails request can override the model.

  • mode: The completion mode. Allowed values are "chat" (default) or "text".

  • cache: Cache configuration for this model. Primarily used for content safety models to cache repeated checks. Contains enabled (default: false), maxsize (default: 50000), and stats sub-fields.

  • parameters: Additional properties to configure interacting with the model. The following fields are supported for all model types:

    • base_url: The URL to use for inference with this model. When using models deployed through the Inference Gateway, NeMo Guardrails automatically routes requests through Inference Gateway. You do not need to explicitly set a Base URL for your model.

    • default_headers: Custom HTTP headers to include in requests to this model. Each key-value pair represents a header name (key) and its default value (value). At inference-time, you can override the default value for a header by including it in the request headers.

The main model is the model that an end user uses for chat and chat-like interactions.

You can also configure task-specific models for any task that occurs during the guardrail process. The following are common tasks that can be configured:

  • content_safety: Content Safety check for detecting harmful content.

  • topic_control: Topic Control check for keeping conversations on-topic.

  • jailbreak: Jailbreak detection check.

  • self_check_input: Safety check that automatically checks the user input using the main model for inference.

  • self_check_output: Safety check that automatically checks the final LLM output using the main model for inference.

models = [
    {
        "type": "content_safety",
        "engine": "nim",
        "model": "default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3"
    },
    {
        "type": "topic_control",
        "engine": "nim",
        "model": "default/nvidia-llama-3-1-nemoguard-8b-topic-control"
    }
]

Tip

Model Entities are automatically created when you:

  • Deploy a NIM through the Inference Gateway (refer to Deploy Models)

  • Create a Model Provider pointing to an external API (refer to Deploy Models)

Use sdk.models.list() to retrieve available Model Entities in your workspace.

Using Direct URLs#

Using Inference Gateway is the recommended approach for interacting with models. If you require a direct connection to specific endpoints, you can explicitly set parameters.base_url:

models = [
    {
        "type": "main",
        "engine": "nim",
        "parameters": {
            "base_url": "http://my-local-nim:8000/v1"
        }
    },
    {
        "type": "content_safety",
        "engine": "nim",
        "model": "nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
        "parameters": {
            "base_url": "http://my-content-safety-nim:8000/v1"
        }
    }
]

Prompts#

A prompt is used by the model during a task to evaluate a message. It consists of the following fields:

  • task: The task to apply the prompt to.

  • content: The content of the prompt. Mutually exclusive with messages.

    • Prompts that require a dynamic input variable(s) use Jinja2 templating. For example, the {{ user_input }} variable is replaced with the end user’s input at runtime.

  • messages: A list of messages for chat-model prompts. Mutually exclusive with content. Each message has type (such as "system" or "user") and content fields.

  • output_parser: Name of output parser to process the model’s response.

  • max_tokens: Maximum number of tokens the model can generate.

  • max_length: Maximum prompt length in characters. When the maximum length is exceeded, the prompt is truncated by removing older turns from the conversation history until the length of the prompt is less than or equal to the maximum length. The default is 16,000 characters.

  • models: Restricts this prompt to specific LLM engines or models. Format: a list of strings such as "<engine>" or "<engine>/<model>".

  • mode: The prompting mode for this prompt. Defaults to the top-level prompting_mode value (typically "standard").

  • stop: A list of stop tokens for models that support this feature.

For Content Safety and Topic Control checks, prompts must include the model reference in the task name:

prompts = [
    {
        "task": "content_safety_check_input $model=content_safety",
        "content": "Task: Check for unsafe content...",
        "output_parser": "nemoguard_parse_prompt_safety",
        "max_tokens": 50
    },
    {
        "task": "topic_safety_check_input $model=topic_control",
        "content": "Ensure the user messages meet the following guidelines: ...",
        "max_tokens": 50
    }
]

Prompt Template Variables#

The following dynamic variables can be used in the prompt content:

Variable

Description

{{ user_input }}

Current user message

{{ bot_response }}

Current bot response (for output rails)

{{ history }}

Conversation history

{{ relevant_chunks }}

Retrieved knowledge base chunks (for retrieval rails)

{{ context }}

Additional context variables


Rails#

Rails specify which flows to apply to the user input and LLM output. Rails are organized into five categories, based on when they trigger during the guardrails process.

Category

Trigger Point

Purpose

Input Rails

When user input is received

Validate, filter, or modify user input

Retrieval Rails

After RAG retrieval completes

Process retrieved chunks

Dialog Rails

After canonical form is computed

Control conversation flow

Execution Rails

Before/after action execution

Control tool and action calls

Output Rails

When LLM generates output

Validate, filter, or modify bot responses

Each rail category supports a flows list and, optionally, a parallel flag to execute those flows concurrently.

The following example defines the flow to run as an input and output rail.

rails = {
    "input": {
        "flows": ["self check input"],
        "parallel": False
    },
    "output": {
        "flows": ["self check output"],
        "parallel": False,
        "streaming": {
            "enabled": False,
            "chunk_size": 200,
            "context_size": 50,
            "stream_first": True
        },
        "apply_to_reasoning_traces": False
    }
}

In addition to input and output, the rails object supports the following rail types:

  • retrieval: Applied to retrieved chunks in RAG deployments. Supports flows.

  • dialog: Applied after canonical form is computed. Supports flows.

  • actions: Controls tool and action execution. Supports instant_actions (list of actions that finish instantly).

  • tool_output: Applied to tool calls before they are executed. Supports flows and parallel.

  • tool_input: Applied to tool results before they are processed. Supports flows and parallel.

Output Rails Streaming#

Output rails support a streaming configuration for processing LLM tokens in chunks:

  • enabled: Enables streaming mode (default: false).

  • chunk_size: Number of tokens per processing chunk (default: 200).

  • context_size: Number of tokens carried from the previous chunk for continuity (default: 50).

  • stream_first: If true, token chunks are streamed before output rails are applied (default: true).

Rails-specific Configuration#

Specific rails can require additional configuration that you specify in the config key. The following integrations are supported:

  • jailbreak_detection – Threshold-based jailbreak detection.

  • injection_detection – Prompt injection detection.

rails = {
    "input": {
        "flows": ["self check input"],
    },
    "output": {
        "flows": ["self check output"],
    },
    "config": {
        # Configures jailbreak detection settings
        "jailbreak_detection": {
            "length_per_perplexity_threshold": 89.79,
            "prefix_suffix_perplexity_threshold": 1845.65
        }
    }
}

General Instructions#

Instructions provide context to the model about expected behavior. They are appended to the beginning of every prompt (similar to a system prompt).

instructions = [
    {
        "type": "general",
        "content": """You are a customer service bot for ABC Company.
You answer questions about products and policies.
If you don't know an answer, say so honestly.
Always be polite and professional."""
    }
]

Sample Conversation#

The sample conversation sets the tone for conversations between the user and the bot. It helps the LLM learn the format, tone, and verbosity of responses. Include a minimum of two turns. The sample conversation is appended to every prompt; keep it short and relevant.

sample_conversation = """user "Hi there. Can you help me with some questions I have about the company?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about the ABC Company. What would you like to know?"
  user "What's the company policy on paid time off?"
    ask question about benefits
  bot respond to question about benefits
    "The ABC Company provides eligible employees with up to two weeks of paid vacation time per year, as well as five paid sick days per year. Please refer to the employee handbook for more information."
"""

Advanced Options#

The guardrail configuration supports additional top-level fields for fine-tuning behavior. All fields below are optional and have sensible defaults.

Field

Default

Description

actions_server_url

null

URL of the actions server used by the rails engine to execute custom actions.

prompting_mode

"standard"

Selects the prompting strategy. Custom prompts can target a specific mode using the prompt-level mode field.

lowest_temperature

0.001

The lowest temperature used for tasks that require deterministic output. Some models do not support 0.0, so this value provides a near-zero alternative.

enable_multi_step_generation

false

Enables multi-step generation for highly capable LLMs. Use with caution; recommended only for models comparable to GPT-3.5-turbo-instruct or newer.

colang_version

"1.0"

The version of the Colang dialogue language to use.

custom_data

{}

A dictionary for arbitrary configuration data that custom actions or integrations can reference at runtime.

enable_rails_exceptions

false

When true, rails raise exceptions instead of returning predefined refusal messages. Useful for programmatic error handling.

passthrough

null

When true, the original prompt passes through the guardrails configuration without alteration. No rails are applied.

tracing

Refer to section below

Configuration object for OpenTelemetry-compatible tracing. Contains enabled (default: false), span_format (default: "opentelemetry"; options: "opentelemetry" or "legacy"), and enable_content_capture (default: false).

Warning

Enabling tracing.enable_content_capture causes prompts and responses (including user, assistant, and tool message content) to be included in telemetry events. This can expose PII and sensitive data in your telemetry backend.


Next Steps#