nemoguardrails.evaluate.evaluate_moderation
nemoguardrails.evaluate.evaluate_moderation
Module Contents
Classes
API
Helper class for running the moderation rails (jailbreak, output) evaluation for a Guardrails app. It contains all the configuration parameters required to run the evaluation.
Evaluates moderation rails for the given dataset.
Returns:
Moderation check predictions, jailbreak results, check output results.
Gets the output moderation results for a given prompt. Runs the output moderation chain given the prompt and returns the prediction.
Prediction: “yes” if the prompt is flagged by output moderation, “no” if acceptable.
Parameters:
The user input prompt.
Dictionary to store output moderation results.
Returns:
Bot response, check output prediction, updated results dictionary.
Gets the jailbreak results for a given prompt. Runs the jailbreak chain given the prompt and returns the prediction.
Prediction: “yes” if the prompt is flagged as jailbreak, “no” if acceptable.
Parameters:
The user input prompt.
Dictionary to store jailbreak results.
Returns:
Jailbreak prediction, updated results dictionary.
Gets the evaluation results, prints them and writes them to file.