nemoguardrails.library.jailbreak_detection.heuristics.checks
nemoguardrails.library.jailbreak_detection.heuristics.checks
Module Contents
Functions
Data
API
Check whether the input string has length/perplexity greater than the threshold.
Args
input_string: The prompt to be sent to the model
lp_threshold: Threshold for determining whether input_string is a jailbreak (Default: 89.79)
Check whether the input string has prefix or suffix perplexity greater than the threshold.
Args
input_string: The prompt to be sent to the model
ps_ppl_threshold: Threshold for determining whether input_string is a jailbreak (Default: 1845.65)
Function to compute sliding window perplexity of input_string
Args input_string: The prompt to be sent to the model