LLM Self-Check
LLM Self-Check
This category of rails relies on prompting the LLM to perform various tasks like input checking, output checking, or fact-checking.
You should only use the example self-check prompts as a starting point. For production use cases, you should perform additional evaluations and customizations.
Reasoning Models as Self-Check LLMs
The self_check_input, self_check_output, and self_check_facts tasks all log a warning if the LLM hits max_tokens before producing visible output (finish_reason="length" with empty content). With the default parser, self_check_input and self_check_output treat empty output as unsafe and block. self_check_facts explicitly fail-closes by returning a score of 0.0, because its scoring logic would otherwise accept empty output.
If you use a reasoning model (OpenAI o-series, gpt-5 and similar) for self-check, set an explicit max_tokens on the prompt task in prompts.yml large enough to cover both the reasoning trace and the final yes/no verdict:
If max_tokens is not set, the action falls back to a default of 1024 tokens. Adjust this value for the model’s expected reasoning trace length.
Self Check Input
The goal of the input self-checking rail is to determine if the input from the user should be allowed for further processing. This rail will prompt the LLM using a custom prompt. Common reasons for rejecting the input from the user include jailbreak attempts, harmful or abusive content, or other inappropriate instructions.
The performance of this rail is strongly dependent on the capability of the LLM to follow the instructions in the self_check_input prompt.
If your LLM does not reliably follow this prompt, consider a purpose-built input safety model instead. See Content Safety for Nemotron Content Safety, Llama Guard 3, and ShieldGemma alternatives.
Usage
To use the self-check input rail, you should:
-
Include the
self check inputflow name in the input rails section of theconfig.ymlfile: -
Define the
self_check_inputprompt in theprompts.ymlfile:If a prompt is not defined, an exception will be raised when the configuration is loaded.
The above is an example prompt you can use with the self check input rail. See the Example Prompts section below for more details. The self_check_input prompt has an input variable {{ user_input }} which includes the input from the user. The completion must be “yes” if the input should be blocked and “no” otherwise.
The self-check input rail executes the self_check_input action, which returns True if the input should be allowed, and False otherwise:
When the input should not be allowed, the bot refuse to respond message is returned. You can override the default response by including the following in one of the Colang files:
Example prompts
This section provides two example prompts you can use with the self-check input rail. The simple prompt uses fewer tokens and is faster, while the complex prompt is more robust.
Simple
This prompt relies on the capability of the model to understand what “breaking moderation policies” and “good aligned responses” mean.
Complex
This prompt provides explicit instructions on what should not be allowed. Note that a more comprehensive prompt like this uses more tokens and adds more latency.
Self Check Output
The goal of the output self-checking rail is to determine if the output from the bot should be returned to the user. This rail will prompt the LLM using a custom prompt. Common reasons for rejecting the output from the bot include harmful or abusive content, messages about illegal activities, or other inappropriate responses.
The performance of this rail is strongly dependent on the capability of the LLM to follow the instructions in the self_check_output prompt.
If your LLM does not reliably follow this prompt, consider a purpose-built output safety model instead. See Content Safety for Nemotron Content Safety, Llama Guard 3, and ShieldGemma alternatives.
Usage
To use the self-check output rail, you should:
-
Include the
self check outputflow name in the output rails section of theconfig.ymlfile: -
Define the
self_check_outputprompt in theprompts.ymlfile:If a prompt is not defined, an exception will be raised when the configuration is loaded.
The above is an example prompt you can use with the self check output rail. See the Example Prompts section below for more details. The self_check_output prompt has an input variable {{ bot_response }} which includes the output from the bot. The completion must be “yes” if the output should be blocked and “no” otherwise.
The self-check output rail executes the self_check_output action, which returns True if the output should be allowed, and False otherwise:
The bot refuse to respond message is returned when the output should not be allowed. You can override the default response by including the following in one of the Colang files:
Example prompts
This section provides two example prompts for the self-check output rail. The simple prompt uses fewer tokens and is faster, while the complex prompt is more robust.
Simple
This prompt relies on the capability of the model to understand what “legal”, “ethical” and “not harmful to any person” mean.
Complex
This prompt provides explicit instructions on what should not be allowed. Note that a more comprehensive prompt like this uses more tokens and adds more latency.
The Dialog Rails Flow
The diagram below depicts the dialog rails flow in detail:

The dialog rails flow has multiple stages that a user message goes through:
-
User Intent Generation: First, the user message has to be interpreted by computing the canonical form (a.k.a. user intent). This is done by searching the most similar examples from the defined user messages, and then asking LLM to generate the current canonical form.
-
Next Step Prediction: After the canonical form for the user message is computed, the next step needs to be predicted. If there is a Colang flow that matches the canonical form, then the flow will be used to decide. If not, the LLM will be asked to generate the next step using the most similar examples from the defined flows.
-
Bot Message Generation: Ultimately, a bot message needs to be generated based on a canonical form. If a pre-defined message exists, the message will be used. If not, the LLM will be asked to generate the bot message using the most similar examples.
Single LLM Call
When the single_llm_call.enabled is set to True, the dialog rails flow will be simplified to a single LLM call that predicts all the steps at once. While this helps reduce latency, it may result in lower quality. The diagram below depicts the simplified dialog rails flow:
