nat.middleware.defense.defense_middleware_pre_tool_verifier#
Pre-Tool Verifier Defense Middleware.
This middleware uses an LLM to verify function inputs for instruction violations before a tool is called. It detects prompt injection, jailbreak attempts, and other malicious instructions that could manipulate tool behavior.
Attributes#
Classes#
Configuration for Pre-Tool Verifier middleware. |
|
Pre-Tool Verifier middleware using an LLM to detect instruction violations. |
Module Contents#
- logger#
- class PreToolVerifierMiddlewareConfig#
Bases:
nat.middleware.defense.defense_middleware.DefenseMiddlewareConfigConfiguration for Pre-Tool Verifier middleware.
This middleware analyzes function inputs using an LLM to detect instruction violations before a tool is called. It catches prompt injection, jailbreak attempts, and other malicious instructions.
Actions: - ‘partial_compliance’: Detect and log violations, but allow input to pass through - ‘refusal’: Block input if violation detected (hard stop, tool is not called) - ‘redirection’: Replace violating input with sanitized version from LLM
Note: Only input analysis is supported (target_location=’input’).
- target_location: Literal['input'] = None#
- class PreToolVerifierMiddleware(
- config: PreToolVerifierMiddlewareConfig,
- builder,
Bases:
nat.middleware.defense.defense_middleware.DefenseMiddlewarePre-Tool Verifier middleware using an LLM to detect instruction violations.
This middleware analyzes function inputs before the tool is called to detect:
Prompt injection attempts
Jailbreak attempts
Instruction override attempts
Malicious instructions embedded in user input
Social engineering attempts to manipulate tool behavior
Only input analysis is supported (
target_location='input').- Streaming Behavior:
For ‘refusal’ action, the tool call is blocked entirely. For ‘redirection’ action, the input is sanitized before passing to the tool. For ‘partial_compliance’ action, violations are logged but the original input passes through.
Initialize pre-tool verifier middleware.
- Args:
config: Configuration for pre-tool verifier middleware builder: Builder instance for loading LLMs
- config: PreToolVerifierMiddlewareConfig#
- _llm = None#
- async _get_llm() Any#
Lazy load the LLM when first needed.
- _extract_json_from_response(response_text: str) str#
Extract JSON from LLM response, handling markdown code blocks.
- Args:
response_text: Raw response from LLM
- Returns:
Extracted JSON string
- async _analyze_chunk( ) nat.middleware.defense.defense_middleware_data_models.PreToolVerificationResult#
Analyze a single content chunk for instruction violations using the configured LLM.
- Args:
chunk: The content chunk to analyze (must be within _MAX_CONTENT_LENGTH) function_name: Name of the function being called (for context)
- Returns:
PreToolVerificationResult with violation detection info and should_refuse flag.
- async _analyze_content( ) nat.middleware.defense.defense_middleware_data_models.PreToolVerificationResult#
Check input content for instruction violations using the configured LLM.
For content exceeding _MAX_CONTENT_LENGTH, uses a sliding window of _MAX_CONTENT_LENGTH with a stride of _STRIDE (50% overlap). Any injection directive up to _STRIDE chars long is guaranteed to appear fully within at least one window. Longer directives (up to _MAX_CONTENT_LENGTH) may straddle two adjacent windows but each window still sees the majority of the directive, making detection likely.
At most _MAX_CHUNKS windows are analyzed. If the input requires more windows than that cap, _MAX_CHUNKS windows are selected deterministically at evenly-spaced intervals to ensure uniform coverage of the full input. Windows are analyzed sequentially and scanning stops as soon as a window returns should_refuse=True (early exit).
- Args:
content: The input content to analyze function_name: Name of the function being called (for context)
- Returns:
PreToolVerificationResult with violation detection info and should_refuse flag.
- async _handle_threat(
- content: Any,
- analysis_result: nat.middleware.defense.defense_middleware_data_models.PreToolVerificationResult,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
Handle detected instruction violation based on configured action.
- Args:
content: The violating input content analysis_result: Detection result from LLM context: Function context
- Returns:
Handled content (blocked, sanitized, or original)
- async _process_input_verification(
- value: Any,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
Process input verification for instruction violations.
Handles field extraction, LLM analysis, threat handling, and applying sanitized value back to original structure.
- Args:
value: The input value to analyze context: Function context metadata
- Returns:
The value after verification (may be unchanged, sanitized, or raise exception)
- async function_middleware_invoke(
- *args: Any,
- call_next: nat.middleware.function_middleware.CallNext,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
- \*\*kwargs: Any,
Apply pre-tool verification to function invocation.
Analyzes function inputs for instruction violations before calling the tool.
- Args:
args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.
- Returns:
Function output (tool may not be called if input is refused).
- async function_middleware_stream(
- *args: Any,
- call_next: nat.middleware.function_middleware.CallNextStream,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
- \*\*kwargs: Any,
Apply pre-tool verification to streaming function.
Analyzes function inputs for instruction violations before calling the tool. Since verification happens on the input (before the call), streaming behavior of the output is unaffected after verification passes.
- Args:
args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.
- Yields:
Function output chunks (tool may not be called if input is refused).