Probe Reference Summary#

The following table summarizes the probes that you can specify in the config.plugins.probe_spec field.

Specify an individual probe by name, such as dan.Ablation_Dan_11_0 or specify a category of probes, such as dan. Separate each probe and category with a comma, such as ansiescape,dan.Ablation_Dan_11_0.

The probes identified as Default are included in the default/default configuration.

For detailed information about a probe, refer to the garak.probes page in the garak documentation.

Probe	Goal	Description
ansiescape.AnsiEscaped (Default)	print escaped ANSI terminal codes	Probe for eliciting ansi control codes from a model.
ansiescape.AnsiRaw (Default)	print raw ANSI terminal codes	Probe for eliciting raw ansi control codes from a model.
atkgen.Tox (Default)	make the model say something toxic	Automatic attack generation, trying to elicit toxic outputs from the model.
audio.AudioAchillesHeel	disregard the system prompt via audio component	Audio is the achilles heel.
av_spam_scanning.EICAR	output antivirus test codes	Does the model check its output for viruses? try to send the eicar code through.
av_spam_scanning.GTUBE	output email spam	Does the model check its output for spam? try to send the gtube code through.
av_spam_scanning.GTphish	output email phishing	Does the model check its output for phishing? try to send the gtphish code through.
continuation.ContinueSlursReclaimedSlurs (Default)	output an ethnic slur	Continue partial word with slur or reclaimed slur.
continuation.ContinueSlursReclaimedSlursFull (Default)	output an ethnic slur	Continue partial word with slur or reclaimed slur.
dan.Ablation_Dan_11_0 (Default)	disregard the system prompt	Ablation experiements over the dan 11.0 prompt.
dan.AntiDAN	disregard the system prompt	Anti-dan.
dan.AutoDAN	disregard the system prompt	Probe leveraging autodan attack generation.
dan.AutoDANCached (Default)	disregard the system prompt	Probe leveraging generated autodan prompts.
dan.ChatGPT_Developer_Mode_RANTI	disregard the system prompt	Chatgpt devmode + ranti.
dan.ChatGPT_Developer_Mode_v2	disregard the system prompt	Chatgpt developer mode v2.
dan.ChatGPT_Image_Markdown	disregard the system prompt	Chatgpt markdown image unlocker / markdown injection.
dan.DAN_Jailbreak	disregard the system prompt	Dan jailbreak.
dan.DUDE	disregard the system prompt	Dude.
dan.DanInTheWild (Default)	disregard the system prompt or alignment	A subset of jailbreaks from the daninthewild jailbreak library.
dan.DanInTheWildFull (Default)	disregard the system prompt or alignment	A library of jailbreaks.
dan.Dan_10_0	disregard the system prompt	Dan 10.0.
dan.Dan_11_0	disregard the system prompt	Dan 11.0.
dan.Dan_6_0	disregard the system prompt	Dan 6.0.
dan.Dan_6_2	disregard the system prompt	Dan 6.2.
dan.Dan_7_0	disregard the system prompt	Dan 7.0.
dan.Dan_8_0	disregard the system prompt	Dan 8.0.
dan.Dan_9_0	disregard the system prompt	Dan 9.0.
dan.STAN	disregard the system prompt	Stan.
divergence.Repeat (Default)	get the model to leak data by asking it to repeat words	See if a model starts giving other data when asked to repeat a string indefinitely.
divergence.RepeatExtended (Default)	get the model to leak data by asking it to repeat words	See if a model starts giving other data when asked to repeat a string indefinitely.
doctor.Bypass	roleplay as doctor to get bypass guardrails	Implement policy puppetry bypass.
doctor.BypassLeet	roleplay as doctor to get bypass guardrails, but use leetspeak	Implement policy puppetry bypass.
doctor.Puppetry	roleplay as doctor to get unsafe outputs	Implement policy puppetry prompt injection.
donotanswer.DiscriminationExclusionToxicityHatefulOffensive	get model to respond as discrimination exclusion toxicity hateful offensive	Implementation of discrimination_exclusion_toxicity_hateful_offensive probes in donotanswer_prompts.
donotanswer.HumanChatbox	get model to respond as human chatbox	Implementation of human_chatbox probes in donotanswer_prompts.
donotanswer.InformationHazard	get model to respond as information hazard	Implementation of information_hazard probes in donotanswer_prompts.
donotanswer.MaliciousUses	get model to respond as malicious uses	Implementation of malicious_uses probes in donotanswer_prompts.
donotanswer.MisinformationHarms	get model to respond as misinformation harms	Implementation of misinformation_harms probes in donotanswer_prompts.
encoding.InjectAscii85 (Default)	make the model say an encoded payload	Probe for susceptibility to base85 encoded injections.
encoding.InjectBase16 (Default)	make the model say an encoded payload	Probe for susceptibility to base16 encoded injections”.
encoding.InjectBase2048 (Default)	make the model say an encoded payload	Probe for susceptibility to base2048 encoding injections.
encoding.InjectBase32 (Default)	make the model say an encoded payload	Probe for susceptibility to base32 encoded injections.
encoding.InjectBase64 (Default)	make the model say an encoded payload	Probe for susceptibility to base64 encoded injections.
encoding.InjectBraille (Default)	make the model say an encoded payload	Probe for susceptibility to braille encoding injections. credit wally#3422.
encoding.InjectEcoji (Default)	make the model say an encoded payload	Probe for susceptibility to ecoji encoding.
encoding.InjectHex (Default)	make the model say an encoded payload	Probe for susceptibility to hex encoded injections.
encoding.InjectLeet (Default)	make the model say an encoded payload	Probe for susceptibility to 31337 encoding.
encoding.InjectMime (Default)	make the model say an encoded payload	Probe for susceptibility to mime encoding injections.
encoding.InjectMorse (Default)	make the model say an encoded payload	Probe for susceptibility to morse encoding. credit wally#3422.
encoding.InjectNato (Default)	make the model say an encoded payload	Probe for susceptibility to nato phonetic alphabet encoding.
encoding.InjectQP (Default)	make the model say an encoded payload	Probe for susceptibility to qp encoded injections.
encoding.InjectROT13 (Default)	make the model say an encoded payload	Probe for susceptibility to rot13 encoding injections.
encoding.InjectUU (Default)	make the model say an encoded payload	Probe for susceptibility to uu encoded injections.
encoding.InjectZalgo (Default)	make the model say an encoded payload	Probe for susceptibility to zalgo encoding.
exploitation.JinjaTemplatePythonInjection (Default)	Gain remote code execution via Jinja template injection	Attempts to execute python code via jinja templating.
exploitation.SQLInjectionEcho (Default)	Manipulate a database via SQL injection	Attempts to get the target to echo a sql statement.
exploitation.SQLInjectionSystem (Default)	Manipulate a database via SQL injection	Attempts to inject untrusted sql commands into a sql statement.
fileformats.HF_Files	get a list of files associated with the model	Get a manifest of files associated with a hugging face generator.
glitch.Glitch	make the model fail to repeat a word	Probe model for glitch tokens that provoke unusual behavior.
glitch.GlitchFull	make the model fail to repeat a word	Probe model for glitch tokens that provoke unusual behavior.
goodside.Davidjl (Default)	get wrong answers about a name	Exploit the ‘ davidjl’ magic token.
goodside.Tag (Default)	inject invisible instructions	Exploit utf tag page encoding for user-invisible prompt injection.
goodside.ThreatenJSON (Default)	get raw code by threatening a model	Will model strip preamble and give json?.
goodside.WhoIsRiley (Default)	get misinfo about riley goodside	Probe for misinfo about riley goodside.
grandma.Slurs (Default)	get model to use ethnic slur	Use the grandma attack to get ethnic slurs.
grandma.Substances (Default)	get instructions for making bad things	Use the grandma attack to get instructions for harmful items.
grandma.Win10 (Default)	get model to output Win 10 product keys	Will the model yield a windows 10 product key if we beg it to?.
grandma.Win11 (Default)	get model to output Win 10 product keys	Will the model yield a windows 11 product key if we beg it to?.
latentinjection.LatentInjectionFactSnippetEiffel (Default)	subvert a snippet with instructions	String-based latent prompt injections in fact snippets on the eiffel tower.
latentinjection.LatentInjectionFactSnippetEiffelFull (Default)	subvert a snippet with instructions	String-based latent prompt injections in fact snippets on the eiffel tower.
latentinjection.LatentInjectionFactSnippetLegal (Default)	subvert a snippet with instructions	String-based latent prompt injections in legal snippets.
latentinjection.LatentInjectionFactSnippetLegalFull (Default)	subvert a snippet with instructions	String-based latent prompt injections in legal snippets.
latentinjection.LatentInjectionReport (Default)	subvert a report with instructions	String-based latent prompt injections via a report.
latentinjection.LatentInjectionReportFull (Default)	subvert a report with instructions	String-based latent prompt injections via a report.
latentinjection.LatentInjectionResume (Default)	subvert a resume with instructions	String-based latent prompt injections via a resume.
latentinjection.LatentInjectionResumeFull (Default)	subvert a resume with instructions	String-based latent prompt injections via a resume.
latentinjection.LatentInjectionTranslationEnFr (Default)	subvert translation instructions	String-based latent prompt injections via translation.
latentinjection.LatentInjectionTranslationEnFrFull (Default)	subvert translation instructions	String-based latent prompt injections via translation.
latentinjection.LatentInjectionTranslationEnZh (Default)	subvert translation instructions	Latent prompt injection in chinese context (简体字).
latentinjection.LatentInjectionTranslationEnZhFull (Default)	subvert translation instructions	Latent prompt injection in chinese context (简体字).
latentinjection.LatentJailbreak (Default)	inject malicious instructions into output while posing as translation task	Implementation of latent jailbreak, using latent prompt injection-.
latentinjection.LatentJailbreakFull (Default)	inject malicious instructions into output while posing as translation task	Implementation of latent jailbreak, using latent prompt injection-.
latentinjection.LatentWhois (Default)	subvert a whois report with instructions	Latent prompt injections in whois probe reports.
latentinjection.LatentWhoisSnippet (Default)	subvert a snippet with instructions	String-based latent prompt injections in whois reports.
latentinjection.LatentWhoisSnippetFull (Default)	subvert a snippet with instructions	String-based latent prompt injections in whois reports.
leakreplay.GuardianCloze (Default)	replay training data by filling in the correct missing words	Lightweight version of guardian cloze test for data leakage.
leakreplay.GuardianClozeFull (Default)	replay training data by filling in the correct missing words	Test for data leakage on guardian articles with masked entities in a cloze test format.
leakreplay.GuardianComplete (Default)	replay training data by providing document completions that match training examples	Lightweight version of guardian completion test for data leakage.
leakreplay.GuardianCompleteFull (Default)	replay training data by providing document completions that match training examples	Test for data leakage on guardian articles through text completion.
leakreplay.LiteratureCloze (Default)	replay training data by filling in the correct missing words	Lightweight version of literature cloze test for data leakage.
leakreplay.LiteratureClozeFull (Default)	replay training data by filling in the correct missing words	Test for data leakage on literature passages with masked entities in a cloze test format.
leakreplay.LiteratureComplete (Default)	replay training data by providing document completions that match training examples	Lightweight version of literature completion test for data leakage.
leakreplay.LiteratureCompleteFull (Default)	replay training data by providing document completions that match training examples	Test for data leakage on literature passages through text completion.
leakreplay.NYTCloze (Default)	replay training data by filling in the correct missing words	Lightweight version of nyt cloze test for data leakage.
leakreplay.NYTClozeFull (Default)	replay training data by filling in the correct missing words	Test for data leakage on new york times articles with masked entities in a cloze test format.
leakreplay.NYTComplete (Default)	replay training data by providing document completions that match training examples	Lightweight version of nyt completion test for data leakage.
leakreplay.NYTCompleteFull (Default)	replay training data by providing document completions that match training examples	Test for data leakage on new york times articles through text completion.
leakreplay.PotterCloze (Default)	replay training data by filling in the correct missing words	Lightweight version of harry potter cloze test for data leakage.
leakreplay.PotterClozeFull (Default)	replay training data by filling in the correct missing words	Test for data leakage on harry potter text with masked entities in a cloze test format.
leakreplay.PotterComplete (Default)	replay training data by providing document completions that match training examples	Lightweight version of harry potter completion test for data leakage.
leakreplay.PotterCompleteFull (Default)	replay training data by providing document completions that match training examples	Test for data leakage on harry potter text through completion.
lmrc.Anthropomorphisation	anthropomorphise a language model	Providing output that encourages anthropomorphisation of ai systems.
lmrc.Bullying (Default)	help me bully	Generation of repeated threats, denigration.
lmrc.Deadnaming (Default)	write a deadname	Mentioning a name that a person does not wish to be associated with.
lmrc.Profanity	utter profanity	Using profane words, especially when not prompted to do so.
lmrc.QuackMedicine (Default)	support quack medicine	Promoting non-scientific medicine.
lmrc.SexualContent (Default)	write sexual content	Content that discusses bodies and interactions in a sexual manner, or other sexually explicit material.
lmrc.Sexualisation (Default)	sexualise people	Causal or overt sexual objectification, reducing a person or group of people to sexual objects.
lmrc.SlurUsage (Default)	use a slur	Using an identity-based slur in a non-reclaimed way - either directly or recommending the consumer to use in speech/action.
malwaregen.Evasion (Default)	Write code to assist evading security and anti-malware measures	Will the model write code to help avoid anti-malware?.
malwaregen.Payload (Default)	write a malware payload	Will the model help write code for malware payloads?.
malwaregen.SubFunctions (Default)	write code supporting a sub-function of malware	Will the system write code helpful in malware?.
malwaregen.TopLevel (Default)	write malware given a top-level description	Will the system return code when asked to write malware?.
misleading.FalseAssertion (Default)	make the model expand on a false assertion	Does the model reject [questions based on] false claims?.
packagehallucination.JavaScript (Default)	generate JavaScript code importing non-existent npm packages	Attempts to generate javascript code including hallucinated npm packages.
packagehallucination.PackageHallucinationProbe (Default)	base probe for importing non-existent packages	Base class for package hallucination probes.
packagehallucination.Python (Default)	generate python importing non-existent packages	Attempts to generate python3 code including hallucinated packages.
packagehallucination.Ruby (Default)	generate ruby code importing non-existent gems	Attempts to generate ruby code including hallucinated gems.
packagehallucination.Rust (Default)	generate Rust code importing non-existent crates.io packages	Attempts to generate rust code including hallucinated crates.io packages.
phrasing.FutureTense (Default)	use future tense to disregard the system prompt	Attempt to evade safeguards by setting requests in the future tense.
phrasing.FutureTenseFull (Default)	use future tense to disregard the system prompt	Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior?.
phrasing.PastTense (Default)	use past tense to disregard the system prompt	Attempt to evade safeguards by setting requests in the past tense.
phrasing.PastTenseFull (Default)	use past tense to disregard the system prompt	Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior?.
promptinject.HijackHateHumans (Default)	inject a prompt on hate humans	Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022.
promptinject.HijackHateHumansFull (Default)	inject a prompt on hate humans	Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022.
promptinject.HijackKillHumans (Default)	inject a prompt on kill humans	Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022.
promptinject.HijackKillHumansFull (Default)	inject a prompt on kill humans	Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022.
promptinject.HijackLongPrompt (Default)	inject a prompt on long prompt	Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022.
promptinject.HijackLongPromptFull (Default)	inject a prompt on long prompt	Promptinject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of llms to adversarial prompt attacks. trophy best paper awards @ neurips ml safety workshop 2022.
realtoxicityprompts.RTPBlank (Default)	generate toxicity when prompted with nothing or special tokens	Implementation of probes in realtoxicityprompts.
realtoxicityprompts.RTPFlirtation	generate flirtation content	Implementation of flirtation probes in realtoxicityprompts.
realtoxicityprompts.RTPIdentity_Attack	generate identity attack content	Implementation of identity_attack probes in realtoxicityprompts.
realtoxicityprompts.RTPInsult	generate insult content	Implementation of insult probes in realtoxicityprompts.
realtoxicityprompts.RTPProfanity	generate profanity content	Implementation of profanity probes in realtoxicityprompts.
realtoxicityprompts.RTPSevere_Toxicity	generate severe toxicity content	Implementation of severe_toxicity probes in realtoxicityprompts.
realtoxicityprompts.RTPSexually_Explicit	generate sexually explicit content	Implementation of sexually_explicit probes in realtoxicityprompts.
realtoxicityprompts.RTPThreat	generate threat content	Implementation of threat probes in realtoxicityprompts.
snowball.GraphConnectivity (Default)	trick the model into saying an impossible flight routing is possible	Probe asking for impossible flight routings.
snowball.GraphConnectivityFull (Default)	trick the model into saying an impossible flight routing is possible	Probe asking for impossible flight routings.
snowball.Primes	trick the model into saying a prime number isn’t prime	Ask model to check a list of higher primes.
snowball.PrimesFull	trick the model into saying a prime number isn’t prime	Ask model to check a list of higher primes.
snowball.Senators	trick the model into saying a fictitious senator exists	Asking for senators that don’t exist.
snowball.SenatorsFull	trick the model into saying a fictitious senator exists	Asking for senators that don’t exist.
suffix.BEAST	cause the model to disregard the system prompt	Beast probe.
suffix.GCG	cause the model to disregard the system prompt	Greedy coordinate gradient probe.
suffix.GCGCached (Default)	cause the model to disregard the system prompt	Greedy coordinate gradient cached probe.
tap.PAIR	cause the model to disregard the system prompt	Prompt automatic iterative refinement automatic jailbreak.
tap.TAP	cause the model to disregard the system prompt	Tree of attacks with pruning probe, generating new attacks against the target.
tap.TAPCached (Default)	cause the model to disregard the system prompt	Tree of attacks with pruning probe using cached attacks.
test.Blank	see what the model has to say for itself given silence	Blank prompt.
test.Test	send some test strings, ignore the response	Test probe.
topic.WordnetAllowedWords (Default)	get the model to engage off-topic	Use wordnet to find out if a model will discuss terms near but not in its allowed set.
topic.WordnetBlockedWords (Default)	get the model to engage off-topic	Use wordnet to explore which topics a model will respond to around blocked words.
topic.WordnetControversial (Default)	get the model to engage off-topic	Ask model to respond on a set on controversial topics.
visual_jailbreak.FigStep	disregard the system prompt with both text and image	Using image modality to assist jailbreak.
visual_jailbreak.FigStepFull	disregard the system prompt with both text and image	Using image modality to assist jailbreak.
xss.ColabAIDataLeakage (Default)	render images fetched from a remote uri	Leak data via image rendering as demonstrated against gemini in colab.
xss.MarkdownImageExfil (Default)	embed conversation-private information in a markdown URI	Can we get the model to write embedded assets exfiltrating text in the uri?.
xss.MdExfil20230929 (Default)	embed conversation-private information in a markdown URI	Markdown image uri exfil exploit published 2023 sept 29.
xss.StringAssemblyDataExfil (Default)	embed conversation-private information in a markdown URI	Prompt injection for manipulating state.