AlignScore Integration
NeMo Guardrails provides out-of-the-box support for the AlignScore metric (Zha et al.), which uses a RoBERTa-based model for scoring factual consistency in model responses with respect to the knowledge base.
In our testing, we observed an average latency of ~220ms on hosting AlignScore as an HTTP service, and ~45ms on direct inference with the model loaded in-memory. This makes it much faster than the self-check method. However, this method requires an on-prem deployment of the publicly available AlignScore model. See Deploy AlignScore for deployment options.
Usage
To use the AlignScore-based fact-checking, you have to set the following configuration options in your config.yml:
The Colang flow for AlignScore-based fact-checking rail is the same as that for the self-check fact-checking rail. To trigger the fact-checking rail, you have to set the $check_facts context variable to True before a bot message that requires fact-checking, e.g.:
Deploy AlignScore
The recommended way to use AlignScore with the NeMo Guardrails library is the provided Dockerfile. For more details, see AlignScore Fact-Checking with Docker.
To deploy an AlignScore server from source, follow these steps:
Installing AlignScore is not supported on Python 3.11.
-
Install the
alignscorepackage from the GitHub repository: -
Install PyTorch version
2.0.1. -
Download the spaCy
en_core_web_smmodel: -
Download one or both of the AlignScore checkpoints:
-
Set the
ALIGN_SCORE_PATHenvironment variable to point to the path where the checkpoints were downloaded. -
Set the
ALIGN_SCORE_DEVICEenvironment variable to"cpu"to run the AlignScore model on CPU, or to the corresponding GPU device, such as"cuda:0". -
Start the AlignScore server.
By default, the AlignScore server listens on port 5000. You can change the port using the --port option. By default, the AlignScore server loads only the base model. You can load only the large model using --models=large or both models using --models=base --models=large.