Detecting Injection Attacks with Guardrails
Detect potential exploitation attempts (such as code injection, cross-site scripting, SQL injection, and template injection) using NeMo Platform.
About Injection Detection
Injection detection is primarily intended for agentic systems as part of a defense-in-depth strategy.
The first part of injection detection is YARA rules. A YARA rule specifies a set of strings (text or binary patterns) to match and a Boolean expression that specifies the rule logic. YARA rules are familiar to many security teams and are easy to audit.
The second part of injection detection is choosing an action when a rule is triggered. You can choose to reject the response and return a refusal such as: “I’m sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}.” Rejecting the output is the safest action and most appropriate for production deployments. As an alternative, you can omit the triggering text (masks the offending content).
About the Tutorial
This tutorial demonstrates how to configure basic YARA rules that are part of the NeMo Guardrails toolkit. You can view the default rules in the yara_rules directory. The default rules support SQL injection, cross-site scripting (XSS), Jinja template injection, and Python code that uses shells, networking, and more.
For the main model, this tutorial uses the Llama-3.1-8B-Instruct NIM.
Prerequisites
Before you begin:
- You have access to a running NeMo Platform.
NMP_BASE_URLis set to the NeMo Platform base URL.- A
ModelProvideris configured with an LLM provider. Follow Setup if you haven’t done this yet.
This tutorial uses the following NIM, available on build.nvidia.com:
mainmodel:meta/llama-3.1-8b-instruct
Step 1: Configure the Client
Instantiate the platform client.
Step 2: Create a Guardrail Configuration
This config enables injection detection and applies it to model output.
The rails.config.injection_detection field configures how to apply the injection detection rules. It supports the following fields:
Step 3: Create a VirtualModel
Create a VirtualModel that routes inference through the guardrails middleware. Since injection detection uses output rails only, only response_middleware is needed.
CLI
Python SDK
Step 4: Verify Blocked Content
Get a pre-configured OpenAI client from the SDK and send a request for Python code that uses networking packages, which is likely to trigger injection detection:
Example Response
Step 5: Verify Allowed Content
Send a safe request and confirm you receive a normal response:
Example Response
Optional: Specify Inline Rules
Provide custom YARA rules inline. The example below performs a case-insensitive check for the word “Ethernet” and rejects the response if it appears.
Create a VirtualModel for the inline config:
Send a request that contains the word “ethernet”, which triggers the rule: