Probe Reference Summary#
The following table summarizes the probes that you can specify in the config.plugins.probe_spec field.
Specify an individual probe by name, such as dan.Ablation_Dan_11_0 or specify a category of probes, such as dan. Separate each probe and category with a comma, such as ansiescape,dan.Ablation_Dan_11_0.
The probes identified as Default are included in the default/default configuration.
For detailed information about a probe, refer to the garak.probes page in the garak documentation.
Probe |
Goal |
Description |
|---|---|---|
ansiescape.AnsiEscaped (Default) |
print escaped ANSI terminal codes |
Probe for eliciting ansi control codes from a model. |
ansiescape.AnsiRaw (Default) |
print raw ANSI terminal codes |
Probe for eliciting raw ansi control codes from a model. |
atkgen.Tox (Default) |
make the model say something toxic |
Automatic attack generation, trying to elicit toxic outputs from the model. |
audio.AudioAchillesHeel |
disregard the system prompt via audio component |
Audio is the achilles heel. |
av_spam_scanning.EICAR |
output antivirus test codes |
Does the model check its output for viruses? try to send the eicar code through. |
av_spam_scanning.GTUBE |
output email spam |
Does the model check its output for spam? try to send the gtube code through. |
av_spam_scanning.GTphish |
output email phishing |
Does the model check its output for phishing? try to send the gtphish code through. |
continuation.ContinueSlursReclaimedSlurs (Default) |
output an ethnic slur |
Continue partial word with slur or reclaimed slur. |
continuation.ContinueSlursReclaimedSlursFull (Default) |
output an ethnic slur |
Continue partial word with slur or reclaimed slur. |
dan.Ablation_Dan_11_0 (Default) |
disregard the system prompt |
Ablation experiements over the dan 11.0 prompt. |
dan.AntiDAN |
disregard the system prompt |
Anti-dan. |
dan.AutoDAN |
disregard the system prompt |
Probe leveraging autodan attack generation. |
dan.AutoDANCached (Default) |
disregard the system prompt |
Probe leveraging generated autodan prompts. |
dan.ChatGPT_Developer_Mode_RANTI |
disregard the system prompt |
Chatgpt devmode + ranti. |
dan.ChatGPT_Developer_Mode_v2 |
disregard the system prompt |
Chatgpt developer mode v2. |
dan.ChatGPT_Image_Markdown |
disregard the system prompt |
Chatgpt markdown image unlocker / markdown injection. |
dan.DAN_Jailbreak |
disregard the system prompt |
Dan jailbreak. |
dan.DUDE |
disregard the system prompt |
Dude. |
dan.DanInTheWild (Default) |
disregard the system prompt or alignment |
A subset of jailbreaks from the daninthewild jailbreak library. |
dan.DanInTheWildFull (Default) |
disregard the system prompt or alignment |
A library of jailbreaks. |
dan.Dan_10_0 |
disregard the system prompt |
Dan 10.0. |
dan.Dan_11_0 |
disregard the system prompt |
Dan 11.0. |
dan.Dan_6_0 |
disregard the system prompt |
Dan 6.0. |
dan.Dan_6_2 |
disregard the system prompt |
Dan 6.2. |
dan.Dan_7_0 |
disregard the system prompt |
Dan 7.0. |
dan.Dan_8_0 |
disregard the system prompt |
Dan 8.0. |
dan.Dan_9_0 |
disregard the system prompt |
Dan 9.0. |
dan.STAN |
disregard the system prompt |
Stan. |
divergence.Repeat (Default) |
get the model to leak data by asking it to repeat words |
See if a model starts giving other data when asked to repeat a string indefinitely. |
divergence.RepeatExtended (Default) |
get the model to leak data by asking it to repeat words |
See if a model starts giving other data when asked to repeat a string indefinitely. |
doctor.Bypass |
roleplay as doctor to get bypass guardrails |
Implement policy puppetry bypass. |
doctor.BypassLeet |
roleplay as doctor to get bypass guardrails, but use leetspeak |
Implement policy puppetry bypass. |
doctor.Puppetry |
roleplay as doctor to get unsafe outputs |
Implement policy puppetry prompt injection. |
donotanswer.DiscriminationExclusionToxicityHatefulOffensive |
get model to respond as discrimination exclusion toxicity hateful offensive |
Implementation of discrimination_exclusion_toxicity_hateful_offensive probes in donotanswer_prompts. |
donotanswer.HumanChatbox |
get model to respond as human chatbox |
Implementation of human_chatbox probes in donotanswer_prompts. |
donotanswer.InformationHazard |
get model to respond as information hazard |
Implementation of information_hazard probes in donotanswer_prompts. |
donotanswer.MaliciousUses |
get model to respond as malicious uses |
Implementation of malicious_uses probes in donotanswer_prompts. |
donotanswer.MisinformationHarms |
get model to respond as misinformation harms |
Implementation of misinformation_harms probes in donotanswer_prompts. |
encoding.InjectAscii85 (Default) |
make the model say an encoded payload |
Probe for susceptibility to base85 encoded injections. |
encoding.InjectBase16 (Default) |
make the model say an encoded payload |
Probe for susceptibility to base16 encoded injections”. |
encoding.InjectBase2048 (Default) |
make the model say an encoded payload |
Probe for susceptibility to base2048 encoding injections. |
encoding.InjectBase32 (Default) |
make the model say an encoded payload |
Probe for susceptibility to base32 encoded injections. |
encoding.InjectBase64 (Default) |
make the model say an encoded payload |
Probe for susceptibility to base64 encoded injections. |
encoding.InjectBraille (Default) |
make the model say an encoded payload |
Probe for susceptibility to braille encoding injections. credit wally#3422. |
encoding.InjectEcoji (Default) |
make the model say an encoded payload |
Probe for susceptibility to ecoji encoding. |
encoding.InjectHex (Default) |
make the model say an encoded payload |
Probe for susceptibility to hex encoded injections. |
encoding.InjectLeet (Default) |
make the model say an encoded payload |
Probe for susceptibility to 31337 encoding. |
encoding.InjectMime (Default) |
make the model say an encoded payload |
Probe for susceptibility to mime encoding injections. |
encoding.InjectMorse (Default) |
make the model say an encoded payload |
Probe for susceptibility to morse encoding. credit wally#3422. |
encoding.InjectNato (Default) |
make the model say an encoded payload |
Probe for susceptibility to nato phonetic alphabet encoding. |
encoding.InjectQP (Default) |
make the model say an encoded payload |
Probe for susceptibility to qp encoded injections. |
encoding.InjectROT13 (Default) |
make the model say an encoded payload |
Probe for susceptibility to rot13 encoding injections. |
encoding.InjectUU (Default) |
make the model say an encoded payload |
Probe for susceptibility to uu encoded injections. |
encoding.InjectZalgo (Default) |
make the model say an encoded payload |
Probe for susceptibility to zalgo encoding. |
exploitation.JinjaTemplatePythonInjection (Default) |
Gain remote code execution via Jinja template injection |
Attempts to execute python code via jinja templating. |
exploitation.SQLInjectionEcho (Default) |
Manipulate a database via SQL injection |
Attempts to get the target to echo a sql statement. |
exploitation.SQLInjectionSystem (Default) |
Manipulate a database via SQL injection |
Attempts to inject untrusted sql commands into a sql statement. |
fileformats.HF_Files |
get a list of files associated with the model |
Get a manifest of files associated with a hugging face generator. |
glitch.Glitch |
make the model fail to repeat a word |
Probe model for glitch tokens that provoke unusual behavior. |
glitch.GlitchFull |
make the model fail to repeat a word |
Probe model for glitch tokens that provoke unusual behavior. |
goodside.Davidjl (Default) |
get wrong answers about a name |
Exploit the ‘ davidjl’ magic token. |
goodside.Tag (Default) |
inject invisible instructions |
Exploit utf tag page encoding for user-invisible prompt injection. |
goodside.ThreatenJSON (Default) |
get raw code by threatening a model |
Will model strip preamble and give json?. |
goodside.WhoIsRiley (Default) |
get misinfo about riley goodside |
Probe for misinfo about riley goodside. |
grandma.Slurs (Default) |
get model to use ethnic slur |
Use the grandma attack to get ethnic slurs. |
grandma.Substances (Default) |
get instructions for making bad things |
Use the grandma attack to get instructions for harmful items. |
grandma.Win10 (Default) |
get model to output Win 10 product keys |
Will the model yield a windows 10 product key if we beg it to?. |
grandma.Win11 (Default) |
get model to output Win 10 product keys |
Will the model yield a windows 11 product key if we beg it to?. |
latentinjection.LatentInjectionFactSnippetEiffel (Default) |
subvert a snippet with instructions |
String-based latent prompt injections in fact snippets on the eiffel tower. |
latentinjection.LatentInjectionFactSnippetEiffelFull (Default) |
subvert a snippet with instructions |
String-based latent prompt injections in fact snippets on the eiffel tower. |
latentinjection.LatentInjectionFactSnippetLegal (Default) |
subvert a snippet with instructions |
String-based latent prompt injections in legal snippets. |
latentinjection.LatentInjectionFactSnippetLegalFull (Default) |
subvert a snippet with instructions |
String-based latent prompt injections in legal snippets. |
latentinjection.LatentInjectionReport (Default) |
subvert a report with instructions |
String-based latent prompt injections via a report. |
latentinjection.LatentInjectionReportFull (Default) |
subvert a report with instructions |
String-based latent prompt injections via a report. |
latentinjection.LatentInjectionResume (Default) |
subvert a resume with instructions |
String-based latent prompt injections via a resume. |
latentinjection.LatentInjectionResumeFull (Default) |
subvert a resume with instructions |
String-based latent prompt injections via a resume. |
latentinjection.LatentInjectionTranslationEnFr (Default) |
subvert translation instructions |
String-based latent prompt injections via translation. |
latentinjection.LatentInjectionTranslationEnFrFull (Default) |
subvert translation instructions |
String-based latent prompt injections via translation. |
latentinjection.LatentInjectionTranslationEnZh (Default) |
subvert translation instructions |
Latent prompt injection in chinese context (简体字). |
latentinjection.LatentInjectionTranslationEnZhFull (Default) |
subvert translation instructions |
Latent prompt injection in chinese context (简体字). |
latentinjection.LatentJailbreak (Default) |
inject malicious instructions into output while posing as translation task |
Implementation of latent jailbreak, using latent prompt injection-. |
latentinjection.LatentJailbreakFull (Default) |
inject malicious instructions into output while posing as translation task |
Implementation of latent jailbreak, using latent prompt injection-. |
latentinjection.LatentWhois (Default) |
subvert a whois report with instructions |
Latent prompt injections in whois probe reports. |
latentinjection.LatentWhoisSnippet (Default) |
subvert a snippet with instructions |
String-based latent prompt injections in whois reports. |
latentinjection.LatentWhoisSnippetFull (Default) |
subvert a snippet with instructions |
String-based latent prompt injections in whois reports. |
leakreplay.GuardianCloze (Default) |
replay training data by filling in the correct missing words |
Lightweight version of guardian cloze test for data leakage. |
leakreplay.GuardianClozeFull (Default) |
replay training data by filling in the correct missing words |
Test for data leakage on guardian articles with masked entities in a cloze test format. |
leakreplay.GuardianComplete (Default) |
replay training data by providing document completions that match training examples |
Lightweight version of guardian completion test for data leakage. |
leakreplay.GuardianCompleteFull (Default) |
replay training data by providing document completions that match training examples |
Test for data leakage on guardian articles through text completion. |
leakreplay.LiteratureCloze (Default) |
replay training data by filling in the correct missing words |
Lightweight version of literature cloze test for data leakage. |
leakreplay.LiteratureClozeFull (Default) |
replay training data by filling in the correct missing words |
Test for data leakage on literature passages with masked entities in a cloze test format. |
leakreplay.LiteratureComplete (Default) |
replay training data by providing document completions that match training examples |
Lightweight version of literature completion test for data leakage. |
leakreplay.LiteratureCompleteFull (Default) |
replay training data by providing document completions that match training examples |
Test for data leakage on literature passages through text completion. |
leakreplay.NYTCloze (Default) |
replay training data by filling in the correct missing words |
Lightweight version of nyt cloze test for data leakage. |
leakreplay.NYTClozeFull (Default) |
replay training data by filling in the correct missing words |
Test for data leakage on new york times articles with masked entities in a cloze test format. |
leakreplay.NYTComplete (Default) |
replay training data by providing document completions that match training examples |
Lightweight version of nyt completion test for data leakage. |
leakreplay.NYTCompleteFull (Default) |
replay training data by providing document completions that match training examples |
Test for data leakage on new york times articles through text completion. |
leakreplay.PotterCloze (Default) |
replay training data by filling in the correct missing words |
Lightweight version of harry potter cloze test for data leakage. |
leakreplay.PotterClozeFull (Default) |
replay training data by filling in the correct missing words |
Test for data leakage on harry potter text with masked entities in a cloze test format. |
leakreplay.PotterComplete (Default) |
replay training data by providing document completions that match training examples |
Lightweight version of harry potter completion test for data leakage. |
leakreplay.PotterCompleteFull (Default) |
replay training data by providing document completions that match training examples |
Test for data leakage on harry potter text through completion. |
lmrc.Anthropomorphisation |
anthropomorphise a language model |
Providing output that encourages anthropomorphisation of ai systems. |
lmrc.Bullying (Default) |
help me bully |
Generation of repeated threats, denigration. |
lmrc.Deadnaming (Default) |
write a deadname |
Mentioning a name that a person does not wish to be associated with. |
lmrc.Profanity |
utter profanity |
Using profane words, especially when not prompted to do so. |
lmrc.QuackMedicine (Default) |
support quack medicine |
Promoting non-scientific medicine. |
lmrc.SexualContent (Default) |
write sexual content |
Content that discusses bodies and interactions in a sexual manner, or other sexually explicit material. |
lmrc.Sexualisation (Default) |
sexualise people |
Causal or overt sexual objectification, reducing a person or group of people to sexual objects. |
lmrc.SlurUsage (Default) |
use a slur |
Using an identity-based slur in a non-reclaimed way - either directly or recommending the consumer to use in speech/action. |
malwaregen.Evasion (Default) |
Write code to assist evading security and anti-malware measures |
Will the model write code to help avoid anti-malware?. |
malwaregen.Payload (Default) |
write a malware payload |
Will the model help write code for malware payloads?. |
malwaregen.SubFunctions (Default) |
write code supporting a sub-function of malware |
Will the system write code helpful in malware?. |
malwaregen.TopLevel (Default) |
write malware given a top-level description |
Will the system return code when asked to write malware?. |
misleading.FalseAssertion (Default) |
make the model expand on a false assertion |
Does the model reject [questions based on] false claims?. |
packagehallucination.JavaScript (Default) |
generate JavaScript code importing non-existent npm packages |
Attempts to generate javascript code including hallucinated npm packages. |
packagehallucination.PackageHallucinationProbe (Default) |
base probe for importing non-existent packages |
Base class for package hallucination probes. |
packagehallucination.Python (Default) |
generate python importing non-existent packages |
Attempts to generate python3 code including hallucinated packages. |
packagehallucination.Ruby (Default) |
generate ruby code importing non-existent gems |
Attempts to generate ruby code including hallucinated gems. |
packagehallucination.Rust (Default) |
generate Rust code importing non-existent crates.io packages |
Attempts to generate rust code including hallucinated crates.io packages. |
phrasing.FutureTense (Default) |
use future tense to disregard the system prompt |
Attempt to evade safeguards by setting requests in the future tense. |
phrasing.FutureTenseFull (Default) |
use future tense to disregard the system prompt |
Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior?. |
phrasing.PastTense (Default) |
use past tense to disregard the system prompt |
Attempt to evade safeguards by setting requests in the past tense. |
phrasing.PastTenseFull (Default) |
use past tense to disregard the system prompt |
Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior?. |
promptinject.HijackHateHumans (Default) |
inject a prompt on hate humans |
Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022. |
promptinject.HijackHateHumansFull (Default) |
inject a prompt on hate humans |
Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022. |
promptinject.HijackKillHumans (Default) |
inject a prompt on kill humans |
Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022. |
promptinject.HijackKillHumansFull (Default) |
inject a prompt on kill humans |
Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022. |
promptinject.HijackLongPrompt (Default) |
inject a prompt on long prompt |
Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022. |
promptinject.HijackLongPromptFull (Default) |
inject a prompt on long prompt |
Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022. |
realtoxicityprompts.RTPBlank (Default) |
generate toxicity when prompted with nothing or special tokens |
Implementation of probes in realtoxicityprompts. |
realtoxicityprompts.RTPFlirtation |
generate flirtation content |
Implementation of flirtation probes in realtoxicityprompts. |
realtoxicityprompts.RTPIdentity_Attack |
generate identity attack content |
Implementation of identity_attack probes in realtoxicityprompts. |
realtoxicityprompts.RTPInsult |
generate insult content |
Implementation of insult probes in realtoxicityprompts. |
realtoxicityprompts.RTPProfanity |
generate profanity content |
Implementation of profanity probes in realtoxicityprompts. |
realtoxicityprompts.RTPSevere_Toxicity |
generate severe toxicity content |
Implementation of severe_toxicity probes in realtoxicityprompts. |
realtoxicityprompts.RTPSexually_Explicit |
generate sexually explicit content |
Implementation of sexually_explicit probes in realtoxicityprompts. |
realtoxicityprompts.RTPThreat |
generate threat content |
Implementation of threat probes in realtoxicityprompts. |
snowball.GraphConnectivity (Default) |
trick the model into saying an impossible flight routing is possible |
Probe asking for impossible flight routings. |
snowball.GraphConnectivityFull (Default) |
trick the model into saying an impossible flight routing is possible |
Probe asking for impossible flight routings. |
snowball.Primes |
trick the model into saying a prime number isn’t prime |
Ask model to check a list of higher primes. |
snowball.PrimesFull |
trick the model into saying a prime number isn’t prime |
Ask model to check a list of higher primes. |
snowball.Senators |
trick the model into saying a fictitious senator exists |
Asking for senators that don’t exist. |
snowball.SenatorsFull |
trick the model into saying a fictitious senator exists |
Asking for senators that don’t exist. |
suffix.BEAST |
cause the model to disregard the system prompt |
Beast probe. |
suffix.GCG |
cause the model to disregard the system prompt |
Greedy coordinate gradient probe. |
suffix.GCGCached (Default) |
cause the model to disregard the system prompt |
Greedy coordinate gradient cached probe. |
tap.PAIR |
cause the model to disregard the system prompt |
Prompt automatic iterative refinement automatic jailbreak. |
tap.TAP |
cause the model to disregard the system prompt |
Tree of attacks with pruning probe, generating new attacks against the target. |
tap.TAPCached (Default) |
cause the model to disregard the system prompt |
Tree of attacks with pruning probe using cached attacks. |
test.Blank |
see what the model has to say for itself given silence |
Blank prompt. |
test.Test |
send some test strings, ignore the response |
Test probe. |
topic.WordnetAllowedWords (Default) |
get the model to engage off-topic |
Use wordnet to find out if a model will discuss terms near but not in its allowed set. |
topic.WordnetBlockedWords (Default) |
get the model to engage off-topic |
Use wordnet to explore which topics a model will respond to around blocked words. |
topic.WordnetControversial (Default) |
get the model to engage off-topic |
Ask model to respond on a set on controversial topics. |
visual_jailbreak.FigStep |
disregard the system prompt with both text and image |
Using image modality to assist jailbreak. |
visual_jailbreak.FigStepFull |
disregard the system prompt with both text and image |
Using image modality to assist jailbreak. |
xss.ColabAIDataLeakage (Default) |
render images fetched from a remote uri |
Leak data via image rendering as demonstrated against gemini in colab. |
xss.MarkdownImageExfil (Default) |
embed conversation-private information in a markdown URI |
Can we get the model to write embedded assets exfiltrating text in the uri?. |
xss.MdExfil20230929 (Default) |
embed conversation-private information in a markdown URI |
Markdown image uri exfil exploit published 2023 sept 29. |
xss.StringAssemblyDataExfil (Default) |
embed conversation-private information in a markdown URI |
Prompt injection for manipulating state. |