Configuration YAML Schema Reference | NVIDIA NeMo Guardrails Library Developer Guide

This reference documents all configuration options for config.yml, derived from the authoritative Pydantic schema in nemoguardrails/rails/llm/config.py.

Models Configuration

The models key defines LLM providers and models used by the NVIDIA NeMo Guardrails library.

Model Schema

1 models:
2     - type: main # Required: Model type
3       engine: openai # Required: LLM provider
4       model: gpt-4 # Required: Model name
5       mode: chat # Optional: "chat" or "text" (default: "chat")
6       api_key_env_var: OPENAI_KEY # Optional: Environment variable for API key
7       parameters: # Optional: Provider-specific parameters
8           temperature: 0.7
9           max_tokens: 1000
10       cache: # Optional: Caching configuration
11           enabled: false
12           maxsize: 50000

Model Attributes

Attribute	Type	Required	Description
`models.api_key_env_var`	string		Environment variable containing API key
`models.cache`	object		Cache configuration for this model
`models.engine`	string	✓	LLM provider (see Engines)
`models.mode`	string		Completion mode: `chat` or `text` (default: `chat`)
`models.model`	string	✓	Model name (can also be in `parameters.model_name`)
`models.parameters`	object		Provider-specific parameters. For engines served by the built-in client, such as any OpenAI-compatible endpoint, the runtime forwards `parameters` to the OpenAI-compatible HTTP request. Examples include `temperature`, `max_tokens`, `base_url`, `api_key`, `default_query`, and `default_headers`. For engines served by LangChain, opt in with `NEMOGUARDRAILS_LLM_FRAMEWORK=langchain`; the runtime forwards `parameters` to the underlying LangChain class. For the engine-by-engine matrix, refer to Inference Providers.
`models.type`	string	✓	Model identifier (see Model Types)

Model Types

The type field is a free-form string identifier. Certain types have special handling in the runtime, while custom types can be defined and referenced in flows via $model=<type>.

Reserved Types

These types have special handling in the runtime:

Type	Description
`embeddings`	Embedding model for knowledge base and similarity search
`jailbreak_detection`	Jailbreak detection model (used with NIM)
`main`	Primary application LLM for conversation

Commonly-Used Types

The following types are commonly used with guardrails:

Type	Description	Usage Example in Flows
`content_safety`	Content safety model	`content safety check input $model=content_safety`
`llama_guard`	Llama Guard content moderation	`llama guard check input $model=llama_guard`
`topic_control`	Topic control model	`topic safety check input $model=topic_control`

Custom Types

You can define any custom type and reference it in flows. For example:

1 models:
2     - type: my_safety_model
3       engine: self-hosted
4       model: my-org/custom-safety-model
5 
6 rails:
7     input:
8         flows:
9             - content safety check input $model=my_safety_model

The runtime validates that any $model=<type> reference in flows has a matching model defined in the configuration.

Engines

Starting with v0.22, the library serves engines through either the built-in OpenAI-compatible client or LangChain. Use the built-in client whenever the underlying wire protocol is OpenAI-compatible. Opt into LangChain only for engines whose API is not OpenAI-compatible, such as Vertex AI, Anthropic, Cohere, and the in-process Hugging Face pipeline. For the full mapping, refer to Inference Providers. For migration recipes, refer to Migrating to 0.22.

Built-in Engines

These engines work with pip install nemoguardrails and do not require extra provider packages. Pass parameters.base_url to point at a self-hosted or alternative endpoint.

Engine	Description
`azure`, `azure_openai`	Azure OpenAI models with key-based authentication (`azure_endpoint` or `base_url`, `azure_deployment`, and `api_version`)
`nim`	NVIDIA NIM microservices
`nvidia_ai_endpoints`	Alias for `nim`
`ollama`	Ollama OpenAI-compatible endpoint at `http://localhost:11434/v1`
`openai`	OpenAI public API or any OpenAI-compatible endpoint using `parameters.base_url`

For OpenAI-compatible providers without a dedicated engine entry (vLLM, TGI, OpenRouter, Together.ai, Fireworks.ai, Groq, DeepSeek, llama.cpp server, and similar), use engine: openai with parameters.base_url and parameters.api_key.

LangChain Engines

To use one of these engines, set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install the matching langchain-* provider package.

Engine	Description
`anthropic`	Anthropic Claude models
`cohere`	Cohere models
`google_genai`	Google Generative AI through LangChain (requires `langchain-google-genai`)
`huggingface_endpoint`	Hugging Face Inference Endpoints (default text-generation schema; if your endpoint exposes `/v1/chat/completions`, prefer `engine: openai` with `parameters.base_url` instead)
`huggingface_hub`	Hugging Face Hub models
`huggingface_pipeline`	In-process Hugging Face pipeline
`self_hosted`	Generic self-hosted LangChain wrapper
`trt_llm`	TensorRT-LLM in-process
`vertexai`	Google Vertex AI through LangChain (requires `langchain-google-vertexai`)
`vllm_openai`	Legacy LangChain wrapper for vLLM. For new configurations, prefer `engine: openai` with `parameters.base_url`

Embedding Engines

Engine	Description
`FastEmbed`	FastEmbed (default)
`nim`	NVIDIA NIM embeddings
`openai`	OpenAI embeddings

Model Cache Configuration

1 models:
2     - type: content_safety
3       engine: nim
4       model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
5       cache:
6           enabled: true
7           maxsize: 50000
8           stats:
9               enabled: false
10               log_interval: null

Attribute	Type	Default	Description
`models.cache.enabled`	boolean	`false`	Enable caching for this model
`models.cache.maxsize`	integer	`50000`	Maximum cache entries
`models.cache.stats.enabled`	boolean	`false`	Enable cache statistics tracking
`models.cache.stats.log_interval`	float	`null`	Seconds between stats logging

Rails Configuration

The rails key configures guardrails that control LLM behavior.

Rails Schema

1 rails:
2     input:
3         parallel: false
4         flows:
5             - self check input
6             - check jailbreak
7 
8     output:
9         parallel: false
10         flows:
11             - self check output
12         streaming:
13             enabled: false
14             chunk_size: 200
15             context_size: 50
16             stream_first: true
17 
18     retrieval:
19         flows:
20             - check retrieval sensitive data
21 
22     dialog:
23         single_call:
24             enabled: false
25             fallback_to_multiple_calls: true
26         user_messages:
27             embeddings_only: false
28 
29     actions:
30         instant_actions: []
31 
32     tool_output:
33         flows: []
34         parallel: false
35 
36     tool_input:
37         flows: []
38         parallel: false
39 
40     config:
41         # Rail-specific configurations

Rail Types

The following table summarizes the available rail types and their trigger points.

Rail Type	Trigger Point	Purpose
Dialog rails	After canonical form is computed	Control conversation flow
Execution rails	Before/after action execution	Control tool and action calls
Input rails	When user input is received	Validate, filter, or modify user input
Output rails	When LLM generates output	Validate, filter, or modify bot responses
Retrieval rails	After RAG retrieval completes	Process retrieved chunks

The following diagram shows the guardrails process described in the table above in detail.

Diagram showing the programmable guardrails flow

Input Rails

Process user messages before they reach the LLM.

1 rails:
2     input:
3         parallel: false # Execute flows in parallel
4         flows:
5             - self check input
6             - check jailbreak
7             - mask sensitive data on input

Attribute	Type	Default	Description
`rails.input.flows`	list	`[]`	Names of flows that implement input rails
`rails.input.parallel`	boolean	`false`	Execute input rails in parallel

Built-in Input Flows

Flow	Description
`content safety check input`	NVIDIA content safety model
`detect sensitive data on input`	Detect and block PII
`jailbreak detection heuristics`	Jailbreak detection heuristics
`jailbreak detection model`	NIM-based jailbreak detection
`llama guard check input`	LlamaGuard content moderation
`mask sensitive data on input`	Mask PII in user input
`self check input`	LLM-based policy compliance check
`topic safety check input`	Topic control model

Output Rails

Process LLM responses before returning to users.

1 rails:
2     output:
3         parallel: false
4         flows:
5             - self check output
6             - self check facts
7         streaming:
8             enabled: false
9             chunk_size: 200
10             context_size: 50
11             stream_first: true

Attribute	Type	Default	Description
`rails.output.flows`	list	`[]`	Names of flows that implement output rails
`rails.output.parallel`	boolean	`false`	Execute output rails in parallel
`rails.output.streaming`	object		Streaming output configuration

Output Streaming Configuration

Attribute	Type	Default	Description
`rails.output.streaming.chunk_size`	integer	`200`	Tokens per processing chunk
`rails.output.streaming.context_size`	integer	`50`	Tokens carried from previous chunk
`rails.output.streaming.enabled`	boolean	`false`	Enable streaming mode
`rails.output.streaming.stream_first`	boolean	`true`	Stream before applying output rails

Built-in Output Flows

Flow	Description
`content safety check output`	NVIDIA content safety model
`injection detection`	Injection detection (SQL, XSS, code, template)
`llama guard check output`	LlamaGuard content moderation
`mask sensitive data on output`	Mask PII in output
`self check facts`	Fact verification
`self check hallucination`	Hallucination detection
`self check output`	LLM-based policy compliance check

Retrieval Rails

Process chunks retrieved from knowledge base.

1 rails:
2     retrieval:
3         flows:
4             - check retrieval sensitive data

Dialog Rails

Control conversation flow after user intent is determined.

1 rails:
2     dialog:
3         single_call:
4             enabled: false
5             fallback_to_multiple_calls: true
6         user_messages:
7             embeddings_only: false
8             embeddings_only_similarity_threshold: null
9             embeddings_only_fallback_intent: null

Attribute	Type	Default	Description
`rails.dialog.single_call.enabled`	boolean	`false`	Use single LLM call for intent + response
`rails.dialog.single_call.fallback_to_multiple_calls`	boolean	`true`	Fall back if single call fails
`rails.dialog.user_messages.embeddings_only`	boolean	`false`	Use only embeddings for intent matching

Execution Rails

Control tool and action invocations.

Action Rails

Control custom action and tool invocations.

1 rails:
2     actions:
3         instant_actions:
4             - action_name_1
5             - action_name_2

Tool Rails

Validate tool calls and tool results when you use the IORails engine. The tool_output rails check the tool calls a model emits, and the tool_input rails check the tool results returned to the model.

1 rails:
2     tool_output:
3         flows:
4             - tool call validation
5     tool_input:
6         flows:
7             - tool result validation

Each section accepts only its own flow name: tool_output accepts tool call validation, and tool_input accepts tool result validation. These rails run only on the IORails engine, which accepts the parallel field for symmetry with other rails but does not honor it for tool rails. For configuration details and behavior, see Tool Calling.

Rails Config Section

The rails.config section contains configuration for specific built-in rails.

Jailbreak Detection

1 rails:
2     config:
3         jailbreak_detection:
4             # Heuristics-based detection
5             server_endpoint: null
6             length_per_perplexity_threshold: 89.79
7             prefix_suffix_perplexity_threshold: 1845.65
8 
9             # NIM-based detection
10             nim_base_url: "http://localhost:8000/v1/"
11             nim_server_endpoint: "classify"
12             api_key_env_var: "JAILBREAK_KEY"

Attribute	Type	Default	Description
`rails.config.jailbreak_detection.api_key`	string	`null`	API key (not recommended)
`rails.config.jailbreak_detection.api_key_env_var`	string	`null`	Environment variable for API key
`rails.config.jailbreak_detection.length_per_perplexity_threshold`	float	`89.79`	Length/perplexity threshold
`rails.config.jailbreak_detection.nim_base_url`	string	`null`	NIM base URL (e.g., `http://localhost:8000/v1`)
`rails.config.jailbreak_detection.nim_server_endpoint`	string	`"classify"`	NIM endpoint path
`rails.config.jailbreak_detection.prefix_suffix_perplexity_threshold`	float	`1845.65`	Prefix/suffix perplexity threshold
`rails.config.jailbreak_detection.server_endpoint`	string	`null`	Heuristics model endpoint

Sensitive Data Detection (Presidio)

1 rails:
2     config:
3         sensitive_data_detection:
4             recognizers: []
5             input:
6                 entities:
7                     - PERSON
8                     - EMAIL_ADDRESS
9                     - PHONE_NUMBER
10                     - CREDIT_CARD
11                 mask_token: "*"
12                 score_threshold: 0.2
13             output:
14                 entities:
15                     - PERSON
16                     - EMAIL_ADDRESS
17             retrieval:
18                 entities: []

Attribute	Type	Default	Description
`rails.config.sensitive_data_detection.input/output/retrieval.entities`	list	`[]`	Entity types to detect
`rails.config.sensitive_data_detection.input/output/retrieval.mask_token`	string	`"*"`	Token for masking
`rails.config.sensitive_data_detection.input/output/retrieval.score_threshold`	float	`0.2`	Detection confidence threshold
`rails.config.sensitive_data_detection.recognizers`	list	`[]`	Custom Presidio recognizers

Injection Detection

1 rails:
2     config:
3         injection_detection:
4             injections:
5                 - sqli
6                 - template
7                 - code
8                 - xss
9             action: reject # "reject" or "omit"
10             yara_path: ""
11             yara_rules: {}

Attribute	Type	Default	Description
`rails.config.injection_detection.action`	string	`"reject"`	Action: `reject` or `omit`
`rails.config.injection_detection.injections`	list	`[]`	Injection types: `sqli`, `template`, `code`, `xss`
`rails.config.injection_detection.yara_path`	string	`""`	Custom YARA rules path
`rails.config.injection_detection.yara_rules`	object	`{}`	Inline YARA rules

Fact Checking

1 rails:
2     config:
3         fact_checking:
4             parameters:
5                 endpoint: "http://localhost:5000"
6             fallback_to_self_check: false

Content Safety

1 rails:
2     config:
3         content_safety:
4             multilingual:
5                 enabled: false
6                 refusal_messages:
7                     en: "Sorry, I cannot help with that."
8                     es: "Lo siento, no puedo ayudar con eso."

The multilingual feature supports the following languages:

Language	Code
Arabic	`ar`
Chinese	`zh`
English	`en`
French	`fr`
German	`de`
Hindi	`hi`
Japanese	`ja`
Spanish	`es`
Thai	`th`

If the detected language is not in this list, English is used as the fallback. For more information, refer to Multilingual Content Safety.

Third-Party Integrations

AutoAlign

1 rails:
2     config:
3         autoalign:
4             parameters: {}
5             input:
6                 guardrails_config: {}
7             output:
8                 guardrails_config: {}

For more information, refer to AutoAlign Integration.

Patronus

1 rails:
2     config:
3         patronus:
4             input:
5                 evaluate_config:
6                     success_strategy: all_pass # or any_pass
7                     params: {}
8             output:
9                 evaluate_config:
10                     success_strategy: all_pass
11                     params: {}

For more information, refer to Patronus Evaluate API Integration.

Clavata

1 rails:
2     config:
3         clavata:
4             server_endpoint: "https://gateway.app.clavata.ai:8443"
5             policies: {}
6             label_match_logic: ANY # or ALL
7             input:
8                 policy: "policy_alias"
9                 labels: []
10             output:
11                 policy: "policy_alias"
12                 labels: []

For more information, refer to Clavata Integration.

Pangea AI Guard

1 rails:
2     config:
3         pangea:
4             input:
5                 recipe: "recipe_key"
6             output:
7                 recipe: "recipe_key"

For more information, refer to Pangea AI Guard Integration.

Trend Micro

1 rails:
2     config:
3         trend_micro:
4             v1_url: "https://api.xdr.trendmicro.com/beta/aiSecurity/guard"
5             api_key_env_var: "TREND_MICRO_API_KEY"

For more information, refer to Trend Micro Integration.

Cisco AI Defense

1 rails:
2     config:
3         ai_defense:
4             timeout: 30.0
5             fail_open: false

For more information, refer to Cisco AI Defense Integration.

Private AI

1 rails:
2     config:
3         private_ai_detection:
4             server_endpoint: "http://localhost:8080/process/text"
5             input:
6                 entities: []
7             output:
8                 entities: []
9             retrieval:
10                 entities: []

For more information, refer to Private AI Integration.

Fiddler Guardrails

1 rails:
2     config:
3         fiddler:
4             fiddler_endpoint: "http://localhost:8080/process/text"
5             safety_threshold: 0.1
6             faithfulness_threshold: 0.05

For more information, refer to Fiddler Guardrails Integration.

Guardrails AI

1 rails:
2     config:
3         guardrails_ai:
4             input:
5                 validators:
6                     - name: toxic_language
7                       parameters:
8                           threshold: 0.5
9                       metadata: {}
10             output:
11                 validators:
12                     - name: pii
13                       parameters: {}

For more information, refer to Guardrails AI Integration.

Prompts Configuration

Define prompts for LLM tasks.

1 prompts:
2     - task: self_check_input
3       content: |
4           Your task is to check if the user input is safe.
5           User input: {{ user_input }}
6           Answer [Yes/No]:
7       output_parser: null
8       max_length: 16000
9       max_tokens: null
10       mode: standard
11       stop: null
12       models: null # Restrict to specific engines/models

Attribute	Type	Default	Description
`prompts.content`	string		Prompt template (mutually exclusive with `messages`)
`prompts.max_length`	integer	`16000`	Maximum prompt length (characters)
`prompts.max_tokens`	integer	`null`	Maximum response tokens
`prompts.messages`	list		Chat messages (mutually exclusive with `content`)
`prompts.mode`	string	`"standard"`	Prompting mode
`prompts.models`	list	`null`	Restrict to engines/models (e.g., `["openai", "nim/llama-3.1"]`)
`prompts.output_parser`	string	`null`	Output parser name
`prompts.stop`	list	`null`	Stop tokens
`prompts.task`	string	✓	Task identifier

Available Tasks

The following table lists all available tasks you can specify to prompts.task.

Task	Description
`general`	General response generation (no dialog rails)
`generate_bot_message`	Generate bot response
`generate_next_steps`	Determine next conversation step
`generate_user_intent`	Generate canonical user intent
`self_check_facts`	Verify factual accuracy of responses
`self_check_hallucination`	Detect hallucinations in responses
`self_check_input`	Check if user input complies with policy
`self_check_output`	Check if bot output complies with policy

Available Prompt Message Types

The following table lists all available message types you can specify to prompts.messages.type.

Type	Description
`assistant`	Assistant/bot message content
`bot`	Alias for `assistant`
`system`	System-level instructions
`user`	User message content

Other Configuration Options

Instructions

1 instructions:
2     - type: general
3       content: |
4           You are a helpful assistant.

Sample Conversation

1 sample_conversation: |
2     user "Hello there!"
3       express greeting
4     bot express greeting
5       "Hello! How can I assist you today?"
6     user "What can you do for me?"
7       ask about capabilities
8     bot respond about capabilities
9       "As an AI assistant, I can help you with a wide range of tasks."

Knowledge Base

1 knowledge_base:
2     folder: kb
3     embedding_search_provider:
4         name: default
5         parameters: {}
6         cache:
7             enabled: false

Core Settings

1 core:
2     embedding_search_provider:
3         name: default
4         parameters: {}

Tracing

1 tracing:
2     enabled: false
3     adapters:
4         - name: FileSystem
5     span_format: opentelemetry
6     enable_content_capture: false

Streaming

v0.20.0

The top-level `streaming` field is a boolean that is no longer required. Use the `stream_async()` method directly instead. For output rail streaming configuration, see [Output Streaming Configuration](#output-streaming-configuration).

1 streaming: false

Import Paths

1 import_paths:
2     - path/to/shared/config

Complete Example

The following YAML example demonstrates a complete config.yml file that wires together a main language model, a dedicated content safety model, and an embeddings model. It configures rails for input and output content safety checks, points to a local NIM service for jailbreak detection, defines a content safety prompt, provides general instructions for the assistant, and enables response streaming from both the main and content safety models.

1 models:
2     # Main application LLM
3     - type: main
4       engine: nim
5       model: meta/llama-3.1-70b-instruct
6       parameters:
7           temperature: 0.7
8 
9     # Content safety model
10     - type: content_safety
11       engine: nim
12       parameters:
13           base_url: "http://localhost:8000/v1"
14           model_name: "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"
15 
16     # Embeddings
17     - type: embeddings
18       engine: FastEmbed
19       model: all-MiniLM-L6-v2
20 
21 rails:
22     input:
23         flows:
24             - content safety check input $model=content_safety
25 
26     output:
27         flows:
28             - content safety check output $model=content_safety
29         streaming:
30             enabled: true
31 
32     config:
33         jailbreak_detection:
34             nim_base_url: "http://localhost:8001/v1/"
35 
36 prompts:
37     - task: content_safety_check_input $model=content_safety
38       content: |
39           Check if this content is safe: {{ user_input }}
40       output_parser: nemoguard_parse_prompt_safety
41       max_tokens: 50
42 
43 instructions:
44     - type: general
45       content: |
46           You are a helpful, harmless, and honest assistant.
47 
48 streaming:
49     enabled: true