nemoguardrails.library.jailbreak_detection.actions

View as Markdown

Module Contents

Functions

NameDescription
jailbreak_detection_heuristicsChecks the user’s prompt to determine if it is attempt to jailbreak the model.
jailbreak_detection_modelUses a trained classifier to determine if a user input is a jailbreak attempt

Data

log

API

nemoguardrails.library.jailbreak_detection.actions.jailbreak_detection_heuristics(
llm_task_manager: nemoguardrails.llm.taskmanager.LLMTaskManager,
context: typing.Optional[dict] = None,
kwargs = {}
) -> bool
async

Checks the user’s prompt to determine if it is attempt to jailbreak the model.

nemoguardrails.library.jailbreak_detection.actions.jailbreak_detection_model(
llm_task_manager: nemoguardrails.llm.taskmanager.LLMTaskManager,
context: typing.Optional[dict] = None,
model_caches: typing.Optional[typing.Dict[str, nemoguardrails.llm.cache.CacheInterface]] = None
) -> bool
async

Uses a trained classifier to determine if a user input is a jailbreak attempt

nemoguardrails.library.jailbreak_detection.actions.log = logging.getLogger(__name__)