Hallucinations & Fact-Checking
Hallucinations & Fact-Checking
Fact-checking guardrails help ensure that LLM output is well grounded in evidence and reduce so-called hallucinations or false claims.
Self-Check Fact-Checking
The goal of the self-check fact-checking output rail is to ensure that the answer to a RAG (Retrieval Augmented Generation) query is grounded in the provided evidence extracted from the knowledge base (KB).
The NeMo Guardrails library uses the concept of relevant chunks (which are stored in the $relevant_chunks context variable) as the evidence against which fact-checking should be performed. The relevant chunks can be extracted automatically, if the built-in knowledge base support is used, or provided directly alongside the query.
The performance of this rail is strongly dependent on the capability of the LLM to follow the instructions in the self_check_facts prompt.
If your LLM does not reliably follow this prompt, consider a model purpose-built for hallucination detection instead. See AlignScore-based Fact-Checking or Patronus Lynx-based RAG Hallucination Detection on this page.
Usage
To use the self-check fact-checking rail, you should:
-
Include the
self check factsflow name in the output rails section of theconfig.ymlfile: -
Define the
self_check_factsprompt in theprompts.ymlfile:If a prompt is not defined, an exception will be raised when the configuration is loaded.
The above is an example prompt that you can use with the self check facts rail. The self_check_facts prompt has two input variables: {{ evidence }}, which includes the relevant chunks, and {{ response }}, which includes the bot response that should be fact-checked. The completion must be “yes” if the response is factually correct and “no” otherwise.
The self-check fact-checking rail executes the self_check_facts action, which returns a score between 0.0 (response is not accurate) and 1.0 (response is accurate). The reason a number is returned, instead of a boolean, is to keep a consistent API with other methods that return a score, e.g., the AlignScore method below.
If the LLM hits its max_tokens budget before producing a verdict (finish_reason="length" with empty content), the action fail-closes and returns 0.0 so the response is blocked. This typically happens with reasoning models that consume output tokens on internal reasoning. Set max_tokens on the self_check_facts prompt task to fit both the reasoning trace and the yes/no verdict; if unset, the action falls back to 1024 tokens. See Reasoning Models as Self-Check LLMs.
To trigger the self-check fact-checking rail for a bot message, you must set the $check_facts context variable to True before a bot message requiring fact-checking. This enables you to explicitly enable fact-checking only when needed (e.g. when answering an important question vs. chitchat).
The example below will trigger the fact-checking output rail every time the bot responds to a question about the report.
Usage in combination with a custom RAG
Fact-checking also works in a custom RAG implementation based on a custom action:
Please refer to the Custom RAG Output Rails example.
Hallucination Detection
The goal of the hallucination detection output rail is to protect against false claims (also called “hallucinations”) in the generated bot message. While similar to the fact-checking rail, hallucination detection can be used when there are no supporting documents (i.e., $relevant_chunks).
Usage
To use the hallucination rail, you should:
-
Include the
self check hallucinationflow name in the output rails section of theconfig.ymlfile: -
Define a
self_check_hallucinationprompt in theprompts.ymlfile:If a prompt is not defined, an exception will be raised when the configuration is loaded.
The above is an example prompt you can use with the self check hallucination rail. The self_check_hallucination prompt has two input variables: {{ paragraph }}, which represents alternative generations for the same user query, and {{ statement }}, which represents the current bot response. The completion must be “yes” if the statement is not a hallucination (i.e., agrees with alternative generations) and “no” otherwise.
You can use the self-check hallucination detection in two modes:
- Blocking: block the message if a hallucination is detected.
- Warning: warn the user if the response is prone to hallucinations.
Blocking Mode
Similar to self-check fact-checking, to trigger the self-check hallucination rail in blocking mode, you have to set the $check_hallucination context variable to True to verify that a bot message is not prone to hallucination:
The above example will trigger the hallucination rail for every people-related question (matching the canonical form user ask about people), which is usually more prone to contain incorrect statements. If the bot message contains hallucinations, the default bot inform answer unknown message is used. To override it, include the following in one of your Colang files:
Warning Mode
Similar to above, if you want to allow sending the response back to the user, but with a warning, you have to set the $hallucination_warning context variable to True.
To override the default message, include the following in one of your Colang files:
Usage in combination with a custom RAG
Hallucination-checking also works in a custom RAG implementation based on a custom action:
Please refer to the Custom RAG Output Rails example.
Implementation Details
The implementation for the self-check hallucination rail uses a slight variation of the SelfCheckGPT paper:
- First, sample several extra responses from the LLM (by default, two extra responses).
- Use the LLM to check if the original and extra responses are consistent.
Similar to the self-check fact-checking, we formulate the consistency checking similar to an NLI task with the original bot response as the hypothesis ({{ statement }}) and the extra generated responses as the context or evidence ({{ paragraph }}).
AlignScore-based Fact-Checking
The NeMo Guardrails library provides out-of-the-box support for the AlignScore metric (Zha et al.), which uses a RoBERTa-based model for scoring factual consistency in model responses with respect to the knowledge base.
Example usage
For more details, check out the AlignScore Integration page.
Patronus Lynx-based RAG Hallucination Detection
The NeMo Guardrails library supports hallucination detection in RAG systems using Patronus AI’s Lynx model. The model is hosted on Hugging Face and comes in both a 70B parameters (see here) and 8B parameters (see here) variant.
Example usage
For more details, check out the Patronus Lynx Integration page.