Configuration YAML Schema Reference

View as Markdown

This reference documents all configuration options for config.yml, derived from the authoritative Pydantic schema in nemoguardrails/rails/llm/config.py.


Models Configuration

The models key defines LLM providers and models used by the NVIDIA NeMo Guardrails library.

Model Schema

1models:
2 - type: main # Required: Model type
3 engine: openai # Required: LLM provider
4 model: gpt-4 # Required: Model name
5 mode: chat # Optional: "chat" or "text" (default: "chat")
6 api_key_env_var: OPENAI_KEY # Optional: Environment variable for API key
7 parameters: # Optional: Provider-specific parameters
8 temperature: 0.7
9 max_tokens: 1000
10 cache: # Optional: Caching configuration
11 enabled: false
12 maxsize: 50000

Model Attributes

AttributeTypeRequiredDescription
models.api_key_env_varstringEnvironment variable containing API key
models.cacheobjectCache configuration for this model
models.enginestringLLM provider (see Engines)
models.modestringCompletion mode: chat or text (default: chat)
models.modelstringModel name (can also be in parameters.model_name)
models.parametersobjectProvider-specific parameters. For engines served by the built-in client, such as any OpenAI-compatible endpoint, the runtime forwards parameters to the OpenAI-compatible HTTP request. Examples include temperature, max_tokens, base_url, api_key, default_query, and default_headers. For engines served by LangChain, opt in with NEMOGUARDRAILS_LLM_FRAMEWORK=langchain; the runtime forwards parameters to the underlying LangChain class. For the engine-by-engine matrix, refer to Inference Providers.
models.typestringModel identifier (see Model Types)

Model Types

The type field is a free-form string identifier. Certain types have special handling in the runtime, while custom types can be defined and referenced in flows via $model=<type>.

Reserved Types

These types have special handling in the runtime:

TypeDescription
embeddingsEmbedding model for knowledge base and similarity search
jailbreak_detectionJailbreak detection model (used with NIM)
mainPrimary application LLM for conversation

Commonly-Used Types

The following types are commonly used with guardrails:

TypeDescriptionUsage Example in Flows
content_safetyContent safety modelcontent safety check input $model=content_safety
llama_guardLlama Guard content moderationllama guard check input $model=llama_guard
topic_controlTopic control modeltopic safety check input $model=topic_control

Custom Types

You can define any custom type and reference it in flows. For example:

1models:
2 - type: my_safety_model
3 engine: self-hosted
4 model: my-org/custom-safety-model
5
6rails:
7 input:
8 flows:
9 - content safety check input $model=my_safety_model

The runtime validates that any $model=<type> reference in flows has a matching model defined in the configuration.

Engines

Starting with v0.22, the library serves engines through either the built-in OpenAI-compatible client or LangChain. Use the built-in client whenever the underlying wire protocol is OpenAI-compatible. Opt into LangChain only for engines whose API is not OpenAI-compatible, such as Vertex AI, Anthropic, Cohere, and the in-process Hugging Face pipeline. For the full mapping, refer to Inference Providers. For migration recipes, refer to Migrating to 0.22.

Built-in Engines

These engines work with pip install nemoguardrails and do not require extra provider packages. Pass parameters.base_url to point at a self-hosted or alternative endpoint.

EngineDescription
azure, azure_openaiAzure OpenAI models with key-based authentication (azure_endpoint or base_url, azure_deployment, and api_version)
nimNVIDIA NIM microservices
nvidia_ai_endpointsAlias for nim
ollamaOllama OpenAI-compatible endpoint at http://localhost:11434/v1
openaiOpenAI public API or any OpenAI-compatible endpoint using parameters.base_url

For OpenAI-compatible providers without a dedicated engine entry (vLLM, TGI, OpenRouter, Together.ai, Fireworks.ai, Groq, DeepSeek, llama.cpp server, and similar), use engine: openai with parameters.base_url and parameters.api_key.

LangChain Engines

To use one of these engines, set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install the matching langchain-* provider package.

EngineDescription
anthropicAnthropic Claude models
cohereCohere models
google_genaiGoogle Generative AI through LangChain (requires langchain-google-genai)
huggingface_endpointHugging Face Inference Endpoints (default text-generation schema; if your endpoint exposes /v1/chat/completions, prefer engine: openai with parameters.base_url instead)
huggingface_hubHugging Face Hub models
huggingface_pipelineIn-process Hugging Face pipeline
self_hostedGeneric self-hosted LangChain wrapper
trt_llmTensorRT-LLM in-process
vertexaiGoogle Vertex AI through LangChain (requires langchain-google-vertexai)
vllm_openaiLegacy LangChain wrapper for vLLM. For new configurations, prefer engine: openai with parameters.base_url

Embedding Engines

EngineDescription
FastEmbedFastEmbed (default)
nimNVIDIA NIM embeddings
openaiOpenAI embeddings

Model Cache Configuration

1models:
2 - type: content_safety
3 engine: nim
4 model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
5 cache:
6 enabled: true
7 maxsize: 50000
8 stats:
9 enabled: false
10 log_interval: null
AttributeTypeDefaultDescription
models.cache.enabledbooleanfalseEnable caching for this model
models.cache.maxsizeinteger50000Maximum cache entries
models.cache.stats.enabledbooleanfalseEnable cache statistics tracking
models.cache.stats.log_intervalfloatnullSeconds between stats logging

Rails Configuration

The rails key configures guardrails that control LLM behavior.

Rails Schema

1rails:
2 input:
3 parallel: false
4 flows:
5 - self check input
6 - check jailbreak
7
8 output:
9 parallel: false
10 flows:
11 - self check output
12 streaming:
13 enabled: false
14 chunk_size: 200
15 context_size: 50
16 stream_first: true
17
18 retrieval:
19 flows:
20 - check retrieval sensitive data
21
22 dialog:
23 single_call:
24 enabled: false
25 fallback_to_multiple_calls: true
26 user_messages:
27 embeddings_only: false
28
29 actions:
30 instant_actions: []
31
32 tool_output:
33 flows: []
34 parallel: false
35
36 tool_input:
37 flows: []
38 parallel: false
39
40 config:
41 # Rail-specific configurations

Rail Types

The following table summarizes the available rail types and their trigger points.

Rail TypeTrigger PointPurpose
Dialog railsAfter canonical form is computedControl conversation flow
Execution railsBefore/after action executionControl tool and action calls
Input railsWhen user input is receivedValidate, filter, or modify user input
Output railsWhen LLM generates outputValidate, filter, or modify bot responses
Retrieval railsAfter RAG retrieval completesProcess retrieved chunks

The following diagram shows the guardrails process described in the table above in detail.

Diagram showing the programmable guardrails flow

Input Rails

Process user messages before they reach the LLM.

1rails:
2 input:
3 parallel: false # Execute flows in parallel
4 flows:
5 - self check input
6 - check jailbreak
7 - mask sensitive data on input
AttributeTypeDefaultDescription
rails.input.flowslist[]Names of flows that implement input rails
rails.input.parallelbooleanfalseExecute input rails in parallel

Built-in Input Flows

FlowDescription
content safety check inputNVIDIA content safety model
detect sensitive data on inputDetect and block PII
jailbreak detection heuristicsJailbreak detection heuristics
jailbreak detection modelNIM-based jailbreak detection
llama guard check inputLlamaGuard content moderation
mask sensitive data on inputMask PII in user input
self check inputLLM-based policy compliance check
topic safety check inputTopic control model

Output Rails

Process LLM responses before returning to users.

1rails:
2 output:
3 parallel: false
4 flows:
5 - self check output
6 - self check facts
7 streaming:
8 enabled: false
9 chunk_size: 200
10 context_size: 50
11 stream_first: true
AttributeTypeDefaultDescription
rails.output.flowslist[]Names of flows that implement output rails
rails.output.parallelbooleanfalseExecute output rails in parallel
rails.output.streamingobjectStreaming output configuration

Output Streaming Configuration

AttributeTypeDefaultDescription
rails.output.streaming.chunk_sizeinteger200Tokens per processing chunk
rails.output.streaming.context_sizeinteger50Tokens carried from previous chunk
rails.output.streaming.enabledbooleanfalseEnable streaming mode
rails.output.streaming.stream_firstbooleantrueStream before applying output rails

Built-in Output Flows

FlowDescription
content safety check outputNVIDIA content safety model
injection detectionInjection detection (SQL, XSS, code, template)
llama guard check outputLlamaGuard content moderation
mask sensitive data on outputMask PII in output
self check factsFact verification
self check hallucinationHallucination detection
self check outputLLM-based policy compliance check

Retrieval Rails

Process chunks retrieved from knowledge base.

1rails:
2 retrieval:
3 flows:
4 - check retrieval sensitive data

Dialog Rails

Control conversation flow after user intent is determined.

1rails:
2 dialog:
3 single_call:
4 enabled: false
5 fallback_to_multiple_calls: true
6 user_messages:
7 embeddings_only: false
8 embeddings_only_similarity_threshold: null
9 embeddings_only_fallback_intent: null
AttributeTypeDefaultDescription
rails.dialog.single_call.enabledbooleanfalseUse single LLM call for intent + response
rails.dialog.single_call.fallback_to_multiple_callsbooleantrueFall back if single call fails
rails.dialog.user_messages.embeddings_onlybooleanfalseUse only embeddings for intent matching

Execution Rails

Control tool and action invocations.

Action Rails

Control custom action and tool invocations.

1rails:
2 actions:
3 instant_actions:
4 - action_name_1
5 - action_name_2

Tool Rails

Control tool input/output processing.

1rails:
2 tool_output:
3 flows:
4 - validate tool parameters
5 parallel: false
6
7 tool_input:
8 flows:
9 - filter tool results
10 parallel: false

Rails Config Section

The rails.config section contains configuration for specific built-in rails.

Jailbreak Detection

1rails:
2 config:
3 jailbreak_detection:
4 # Heuristics-based detection
5 server_endpoint: null
6 length_per_perplexity_threshold: 89.79
7 prefix_suffix_perplexity_threshold: 1845.65
8
9 # NIM-based detection
10 nim_base_url: "http://localhost:8000/v1/"
11 nim_server_endpoint: "classify"
12 api_key_env_var: "JAILBREAK_KEY"
AttributeTypeDefaultDescription
rails.config.jailbreak_detection.api_keystringnullAPI key (not recommended)
rails.config.jailbreak_detection.api_key_env_varstringnullEnvironment variable for API key
rails.config.jailbreak_detection.length_per_perplexity_thresholdfloat89.79Length/perplexity threshold
rails.config.jailbreak_detection.nim_base_urlstringnullNIM base URL (e.g., http://localhost:8000/v1)
rails.config.jailbreak_detection.nim_server_endpointstring"classify"NIM endpoint path
rails.config.jailbreak_detection.prefix_suffix_perplexity_thresholdfloat1845.65Prefix/suffix perplexity threshold
rails.config.jailbreak_detection.server_endpointstringnullHeuristics model endpoint

Sensitive Data Detection (Presidio)

1rails:
2 config:
3 sensitive_data_detection:
4 recognizers: []
5 input:
6 entities:
7 - PERSON
8 - EMAIL_ADDRESS
9 - PHONE_NUMBER
10 - CREDIT_CARD
11 mask_token: "*"
12 score_threshold: 0.2
13 output:
14 entities:
15 - PERSON
16 - EMAIL_ADDRESS
17 retrieval:
18 entities: []
AttributeTypeDefaultDescription
rails.config.sensitive_data_detection.input/output/retrieval.entitieslist[]Entity types to detect
rails.config.sensitive_data_detection.input/output/retrieval.mask_tokenstring"*"Token for masking
rails.config.sensitive_data_detection.input/output/retrieval.score_thresholdfloat0.2Detection confidence threshold
rails.config.sensitive_data_detection.recognizerslist[]Custom Presidio recognizers

Injection Detection

1rails:
2 config:
3 injection_detection:
4 injections:
5 - sqli
6 - template
7 - code
8 - xss
9 action: reject # "reject" or "omit"
10 yara_path: ""
11 yara_rules: {}
AttributeTypeDefaultDescription
rails.config.injection_detection.actionstring"reject"Action: reject or omit
rails.config.injection_detection.injectionslist[]Injection types: sqli, template, code, xss
rails.config.injection_detection.yara_pathstring""Custom YARA rules path
rails.config.injection_detection.yara_rulesobject{}Inline YARA rules

Fact Checking

1rails:
2 config:
3 fact_checking:
4 parameters:
5 endpoint: "http://localhost:5000"
6 fallback_to_self_check: false

Content Safety

1rails:
2 config:
3 content_safety:
4 multilingual:
5 enabled: false
6 refusal_messages:
7 en: "Sorry, I cannot help with that."
8 es: "Lo siento, no puedo ayudar con eso."

The multilingual feature supports the following languages:

LanguageCode
Arabicar
Chinesezh
Englishen
Frenchfr
Germande
Hindihi
Japaneseja
Spanishes
Thaith

If the detected language is not in this list, English is used as the fallback. For more information, refer to Multilingual Content Safety.

Third-Party Integrations

AutoAlign
1rails:
2 config:
3 autoalign:
4 parameters: {}
5 input:
6 guardrails_config: {}
7 output:
8 guardrails_config: {}

For more information, refer to AutoAlign Integration.

Patronus
1rails:
2 config:
3 patronus:
4 input:
5 evaluate_config:
6 success_strategy: all_pass # or any_pass
7 params: {}
8 output:
9 evaluate_config:
10 success_strategy: all_pass
11 params: {}

For more information, refer to Patronus Evaluate API Integration.

Clavata
1rails:
2 config:
3 clavata:
4 server_endpoint: "https://gateway.app.clavata.ai:8443"
5 policies: {}
6 label_match_logic: ANY # or ALL
7 input:
8 policy: "policy_alias"
9 labels: []
10 output:
11 policy: "policy_alias"
12 labels: []

For more information, refer to Clavata Integration.

Pangea AI Guard
1rails:
2 config:
3 pangea:
4 input:
5 recipe: "recipe_key"
6 output:
7 recipe: "recipe_key"

For more information, refer to Pangea AI Guard Integration.

Trend Micro
1rails:
2 config:
3 trend_micro:
4 v1_url: "https://api.xdr.trendmicro.com/beta/aiSecurity/guard"
5 api_key_env_var: "TREND_MICRO_API_KEY"

For more information, refer to Trend Micro Integration.

Cisco AI Defense
1rails:
2 config:
3 ai_defense:
4 timeout: 30.0
5 fail_open: false

For more information, refer to Cisco AI Defense Integration.

Private AI
1rails:
2 config:
3 private_ai_detection:
4 server_endpoint: "http://localhost:8080/process/text"
5 input:
6 entities: []
7 output:
8 entities: []
9 retrieval:
10 entities: []

For more information, refer to Private AI Integration.

Fiddler Guardrails
1rails:
2 config:
3 fiddler:
4 fiddler_endpoint: "http://localhost:8080/process/text"
5 safety_threshold: 0.1
6 faithfulness_threshold: 0.05

For more information, refer to Fiddler Guardrails Integration.

Guardrails AI
1rails:
2 config:
3 guardrails_ai:
4 input:
5 validators:
6 - name: toxic_language
7 parameters:
8 threshold: 0.5
9 metadata: {}
10 output:
11 validators:
12 - name: pii
13 parameters: {}

For more information, refer to Guardrails AI Integration.


Prompts Configuration

Define prompts for LLM tasks.

1prompts:
2 - task: self_check_input
3 content: |
4 Your task is to check if the user input is safe.
5 User input: {{ user_input }}
6 Answer [Yes/No]:
7 output_parser: null
8 max_length: 16000
9 max_tokens: null
10 mode: standard
11 stop: null
12 models: null # Restrict to specific engines/models
AttributeTypeDefaultDescription
prompts.contentstringPrompt template (mutually exclusive with messages)
prompts.max_lengthinteger16000Maximum prompt length (characters)
prompts.max_tokensintegernullMaximum response tokens
prompts.messageslistChat messages (mutually exclusive with content)
prompts.modestring"standard"Prompting mode
prompts.modelslistnullRestrict to engines/models (e.g., ["openai", "nim/llama-3.1"])
prompts.output_parserstringnullOutput parser name
prompts.stoplistnullStop tokens
prompts.taskstringTask identifier

Available Tasks

The following table lists all available tasks you can specify to prompts.task.

TaskDescription
generalGeneral response generation (no dialog rails)
generate_bot_messageGenerate bot response
generate_next_stepsDetermine next conversation step
generate_user_intentGenerate canonical user intent
self_check_factsVerify factual accuracy of responses
self_check_hallucinationDetect hallucinations in responses
self_check_inputCheck if user input complies with policy
self_check_outputCheck if bot output complies with policy

Available Prompt Message Types

The following table lists all available message types you can specify to prompts.messages.type.

TypeDescription
assistantAssistant/bot message content
botAlias for assistant
systemSystem-level instructions
userUser message content

Other Configuration Options

Instructions

1instructions:
2 - type: general
3 content: |
4 You are a helpful assistant.

Sample Conversation

1sample_conversation: |
2 user "Hello there!"
3 express greeting
4 bot express greeting
5 "Hello! How can I assist you today?"
6 user "What can you do for me?"
7 ask about capabilities
8 bot respond about capabilities
9 "As an AI assistant, I can help you with a wide range of tasks."

Knowledge Base

1knowledge_base:
2 folder: kb
3 embedding_search_provider:
4 name: default
5 parameters: {}
6 cache:
7 enabled: false

Core Settings

1core:
2 embedding_search_provider:
3 name: default
4 parameters: {}

Tracing

1tracing:
2 enabled: false
3 adapters:
4 - name: FileSystem
5 span_format: opentelemetry
6 enable_content_capture: false

Streaming

v0.20.0
1The top-level `streaming` field is a boolean that is no longer required. Use the `stream_async()` method directly instead. For output rail streaming configuration, see [Output Streaming Configuration](#output-streaming-configuration).
1streaming: false

Import Paths

1import_paths:
2 - path/to/shared/config

Complete Example

The following YAML example demonstrates a complete config.yml file that wires together a main language model, a dedicated content safety model, and an embeddings model. It configures rails for input and output content safety checks, points to a local NIM service for jailbreak detection, defines a content safety prompt, provides general instructions for the assistant, and enables response streaming from both the main and content safety models.

1models:
2 # Main application LLM
3 - type: main
4 engine: nim
5 model: meta/llama-3.1-70b-instruct
6 parameters:
7 temperature: 0.7
8
9 # Content safety model
10 - type: content_safety
11 engine: nim
12 parameters:
13 base_url: "http://localhost:8000/v1"
14 model_name: "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"
15
16 # Embeddings
17 - type: embeddings
18 engine: FastEmbed
19 model: all-MiniLM-L6-v2
20
21rails:
22 input:
23 flows:
24 - content safety check input $model=content_safety
25
26 output:
27 flows:
28 - content safety check output $model=content_safety
29 streaming:
30 enabled: true
31
32 config:
33 jailbreak_detection:
34 nim_base_url: "http://localhost:8001/v1/"
35
36prompts:
37 - task: content_safety_check_input $model=content_safety
38 content: |
39 Check if this content is safe: {{ user_input }}
40 output_parser: nemoguard_parse_prompt_safety
41 max_tokens: 50
42
43instructions:
44 - type: general
45 content: |
46 You are a helpful, harmless, and honest assistant.
47
48streaming:
49 enabled: true