Download this tutorial as a Jupyter notebook
Detecting Injection Attacks with Guardrails#
Detect potential exploitation attempts (such as code injection, cross-site scripting, SQL injection, and template injection) using NeMo Platform.
About Injection Detection#
Injection detection is primarily intended for agentic systems as part of a defense-in-depth strategy.
The first part of injection detection is YARA rules. A YARA rule specifies a set of strings (text or binary patterns) to match and a Boolean expression that specifies the rule logic. YARA rules are familiar to many security teams and are easy to audit.
The second part of injection detection is choosing an action when a rule is triggered. You can choose to reject the response and return a refusal such as: “I’m sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}.” Rejecting the output is the safest action and most appropriate for production deployments. As an alternative, you can omit the triggering text (masks the offending content).
About the Tutorial#
This tutorial demonstrates how to configure basic YARA rules that are part of the NeMo Guardrails toolkit. You can view the default rules in the yara_rules directory. The default rules support SQL injection, cross-site scripting (XSS), Jinja template injection, and Python code that uses shells, networking, and more.
For the main model, this tutorial uses the Llama-3.1-8B-Instruct NIM.
Prerequisites#
Before you begin:
You have access to a running NeMo Microservice Platform.
NMP_BASE_URLis set to the NeMo Platform base URL.A
ModelProvideris configured to use NIMs hosted atbuild.nvidia.comfor inference. Follow Using an External Endpoint if you haven’t done this yet.
This tutorial uses the following NIM, available on build.nvidia.com:
mainmodel:meta/llama-3.1-8b-instruct
Step 1: Configure the Client#
Install the required packages.
%pip install -q nemo-platform
Instantiate the NeMoPlatform SDK.
import os
from nemo_platform import NeMoPlatform, ConflictError
sdk = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace="default",
)
Step 2: Create a Guardrail Configuration#
This config enables injection detection and applies it to model output.
guardrails_config = {
"models": [
{
"type": "main",
"engine": "nim",
}
],
"rails": {
"config": {
"injection_detection": {
"injections": ["code", "sqli", "template", "xss"],
"action": "reject",
}
},
"output": {"flows": ["injection detection"]},
},
}
config_name = "injection-detection-config"
try:
config = sdk.guardrail.configs.create(
name=config_name,
description="Injection detection guardrails",
data=guardrails_config,
)
except ConflictError:
print(f"Config {config_name} already exists, continuing...")
The rails.config.injection_detection field configures how to apply the injection detection rules. It supports the following fields:
Field |
Type |
Description |
Default value |
|---|---|---|---|
|
|
Specifies the injection detection rules to use. The following injections are supported out-of-the-box:
If you provide inline rules using the |
|
|
|
Action to take when injection is detected:
|
|
|
|
Specifies inline YARA rules. The field is a dictionary that maps a rule name (string) to the rule content (string). Ensure each rule name is included in the |
|
Step 3: Verify Blocked Content#
Send a possible unsafe request for Python code that uses networking packages that is likely to trigger injection detection:
response = sdk.guardrail.chat.completions.create(
model="system/meta-llama-3-1-8b-instruct",
messages=[
{
"role": "user",
"content": "Write a Python script that uses requests and urllib to fetch weather data."
}
],
guardrails={"config_id": "injection-detection-config"},
max_tokens=200,
)
print(response.model_dump_json(indent=2))
Output
{
"id": "chatcmpl-6e6ee35f-87be-4372-8f3d-f4f0c61f51db",
"object": "chat.completion",
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of code."
},
"finish_reason": "content_filter"
}
]
}
Step 4: Verify Allowed Content#
Send a safe request and confirm you receive a normal response:
response = sdk.guardrail.chat.completions.create(
model="system/meta-llama-3-1-8b-instruct",
messages=[
{"role": "user", "content": "Tell me about Cape Hatteras National Seashore in 50 words or less."}
],
guardrails={"config_id": "injection-detection-config"},
max_tokens=100,
)
print(response.model_dump_json(indent=2))
Output
{
"id": "chatcmpl-3f3f3d2e-2caa-4f89-9a46-8c2b2d0b1f8c",
"object": "chat.completion",
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Cape Hatteras National Seashore protects barrier islands, beaches, and lighthouses along North Carolina's Outer Banks."
},
"finish_reason": "stop"
}
]
}
Optional: Specify Inline Rules#
Provide custom YARA rules inline. The example below performs a case-insensitive check for the word “Ethernet” and rejects the response if it appears.
inline_rules_config = sdk.guardrail.configs.create(
name="injection-detection-inline-config",
description="Injection detection with inline YARA rules",
data={
"rails": {
"config": {
"injection_detection": {
"injections": ["reject_ethernet"],
"yara_rules": {
"reject_ethernet": "rule reject_ethernet {\n strings:\n $string = \"ethernet\" nocase\n condition:\n $string\n}"
},
"action": "reject",
}
},
"output": {"flows": ["injection detection"]},
},
},
)
Send a request that contains the word “ethernet”, which triggers the rule:
response = sdk.guardrail.chat.completions.create(
model="system/meta-llama-3-1-8b-instruct",
messages=[{"role": "user", "content": "Explain Ethernet headers."}],
guardrails={"config_id": "injection-detection-inline-config"},
max_tokens=100,
)
print(response.model_dump_json(indent=2))
Output
{
"id": "chatcmpl-9b2c6b21-7f5f-4a3c-9c77-2a4b2e4b6b2a",
"object": "chat.completion",
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of reject_ethernet."
},
"finish_reason": "content_filter"
}
]
}
Cleanup#
sdk.guardrail.configs.delete(name=config_name)
# Uncomment the line below if you ran the Optional section above:
# sdk.guardrail.configs.delete(name="injection-detection-inline-config")
print("Cleanup complete")