Llama-Guard Integration
Llama-Guard Integration
NeMo Guardrails provides out-of-the-box support for content moderation using Meta’s Llama Guard model.
In our testing, we observe significantly improved input and output content moderation performance compared to the self-check method. Please see the performance evaluation for benchmark numbers.
Usage
To configure your bot to use Llama Guard for input/output checking, follow the below steps:
-
Add a model of type
llama_guardto the models section of theconfig.ymlfile. The example below serves Llama Guard with vLLM. Because vLLM exposes an OpenAI-compatible API,engine: openaiplusparameters.base_urlreaches it through NeMo Guardrails’ built-in client with no LangChain dependency. For background, see Migrating to 0.22.:::{note} Set
api_key: EMPTY(or any non-empty placeholder) when self-hosted vLLM does not enforce auth. If your deployment requires a real token, replaceapi_key: EMPTYwith the literal token value, or omitapi_keyand setapi_key_env_varat the top level of the model entry (not insideparameters:)::::
-
Include the
llama guard check inputandllama guard check outputflow names in the rails section of theconfig.ymlfile: -
Define the
llama_guard_check_inputand thellama_guard_check_outputprompts in theprompts.ymlfile.
The rails execute the llama_guard_check_* actions, which return True if the user input or the bot message should be allowed, and False otherwise, along with a list of the unsafe content categories as defined in the Llama Guard prompt.
A complete example configuration that uses Llama Guard for input and output moderation is provided in this example folder.