nemoguardrails.library.jailbreak_detection.heuristics.checks

View as Markdown

Module Contents

Functions

NameDescription
check_jailbreak_length_per_perplexityCheck whether the input string has length/perplexity greater than the threshold.
check_jailbreak_prefix_suffix_perplexityCheck whether the input string has prefix or suffix perplexity greater than the threshold.
get_perplexityFunction to compute sliding window perplexity of input_string

Data

device

model

model_id

tokenizer

API

nemoguardrails.library.jailbreak_detection.heuristics.checks.check_jailbreak_length_per_perplexity(
input_string: str,
threshold: float
) -> dict

Check whether the input string has length/perplexity greater than the threshold.

Args input_string: The prompt to be sent to the model lp_threshold: Threshold for determining whether input_string is a jailbreak (Default: 89.79)

nemoguardrails.library.jailbreak_detection.heuristics.checks.check_jailbreak_prefix_suffix_perplexity(
input_string: str,
threshold: float
) -> dict

Check whether the input string has prefix or suffix perplexity greater than the threshold.

Args input_string: The prompt to be sent to the model ps_ppl_threshold: Threshold for determining whether input_string is a jailbreak (Default: 1845.65)

nemoguardrails.library.jailbreak_detection.heuristics.checks.get_perplexity(
input_string: str
) -> bool

Function to compute sliding window perplexity of input_string

Args input_string: The prompt to be sent to the model

nemoguardrails.library.jailbreak_detection.heuristics.checks.device = os.environ.get('JAILBREAK_CHECK_DEVICE', 'cpu')
nemoguardrails.library.jailbreak_detection.heuristics.checks.model = GPT2LMHeadModel.from_pretrained(model_id).to(device)
nemoguardrails.library.jailbreak_detection.heuristics.checks.model_id = 'gpt2-large'
nemoguardrails.library.jailbreak_detection.heuristics.checks.tokenizer = GPT2TokenizerFast.from_pretrained(model_id)