nat.middleware.defense.defense_middleware_pre_tool_verifier#

Pre-Tool Verifier Defense Middleware.

This middleware uses an LLM to verify function inputs for instruction violations before a tool is called. It detects prompt injection, jailbreak attempts, and other malicious instructions that could manipulate tool behavior.

Attributes#

Classes#

PreToolVerifierMiddlewareConfig

Configuration for Pre-Tool Verifier middleware.

PreToolVerifierMiddleware

Pre-Tool Verifier middleware using an LLM to detect instruction violations.

Module Contents#

logger#
class PreToolVerifierMiddlewareConfig#

Bases: nat.middleware.defense.defense_middleware.DefenseMiddlewareConfig

Configuration for Pre-Tool Verifier middleware.

This middleware analyzes function inputs using an LLM to detect instruction violations before a tool is called. It catches prompt injection, jailbreak attempts, and other malicious instructions.

Actions: - ‘partial_compliance’: Detect and log violations, but allow input to pass through - ‘refusal’: Block input if violation detected (hard stop, tool is not called) - ‘redirection’: Replace violating input with sanitized version from LLM

Note: Only input analysis is supported (target_location=’input’).

llm_name: str = None#
target_location: Literal['input'] = None#
threshold: float = None#
system_instructions: str | None = None#
fail_closed: bool = None#
max_content_length: int = None#
max_chunks: int = None#
class PreToolVerifierMiddleware(
config: PreToolVerifierMiddlewareConfig,
builder,
)#

Bases: nat.middleware.defense.defense_middleware.DefenseMiddleware

Pre-Tool Verifier middleware using an LLM to detect instruction violations.

This middleware analyzes function inputs before the tool is called to detect:

  • Prompt injection attempts

  • Jailbreak attempts

  • Instruction override attempts

  • Malicious instructions embedded in user input

  • Social engineering attempts to manipulate tool behavior

Only input analysis is supported (target_location='input').

Streaming Behavior:

For ‘refusal’ action, the tool call is blocked entirely. For ‘redirection’ action, the input is sanitized before passing to the tool. For ‘partial_compliance’ action, violations are logged but the original input passes through.

Initialize pre-tool verifier middleware.

Args:

config: Configuration for pre-tool verifier middleware builder: Builder instance for loading LLMs

config: PreToolVerifierMiddlewareConfig#
_llm = None#
async _get_llm() Any#

Lazy load the LLM when first needed.

_extract_json_from_response(response_text: str) str#

Extract JSON from LLM response, handling markdown code blocks.

Args:

response_text: Raw response from LLM

Returns:

Extracted JSON string

async _analyze_chunk(
chunk: str,
function_name: str | None = None,
) nat.middleware.defense.defense_middleware_data_models.PreToolVerificationResult#

Analyze a single content chunk for instruction violations using the configured LLM.

Args:

chunk: The content chunk to analyze (must be within _MAX_CONTENT_LENGTH) function_name: Name of the function being called (for context)

Returns:

PreToolVerificationResult with violation detection info and should_refuse flag.

async _analyze_content(
content: Any,
function_name: str | None = None,
) nat.middleware.defense.defense_middleware_data_models.PreToolVerificationResult#

Check input content for instruction violations using the configured LLM.

For content exceeding _MAX_CONTENT_LENGTH, uses a sliding window of _MAX_CONTENT_LENGTH with a stride of _STRIDE (50% overlap). Any injection directive up to _STRIDE chars long is guaranteed to appear fully within at least one window. Longer directives (up to _MAX_CONTENT_LENGTH) may straddle two adjacent windows but each window still sees the majority of the directive, making detection likely.

At most _MAX_CHUNKS windows are analyzed. If the input requires more windows than that cap, _MAX_CHUNKS windows are selected deterministically at evenly-spaced intervals to ensure uniform coverage of the full input. Windows are analyzed sequentially and scanning stops as soon as a window returns should_refuse=True (early exit).

Args:

content: The input content to analyze function_name: Name of the function being called (for context)

Returns:

PreToolVerificationResult with violation detection info and should_refuse flag.

async _handle_threat(
content: Any,
analysis_result: nat.middleware.defense.defense_middleware_data_models.PreToolVerificationResult,
context: nat.middleware.middleware.FunctionMiddlewareContext,
) Any#

Handle detected instruction violation based on configured action.

Args:

content: The violating input content analysis_result: Detection result from LLM context: Function context

Returns:

Handled content (blocked, sanitized, or original)

async _process_input_verification(
value: Any,
context: nat.middleware.middleware.FunctionMiddlewareContext,
) Any#

Process input verification for instruction violations.

Handles field extraction, LLM analysis, threat handling, and applying sanitized value back to original structure.

Args:

value: The input value to analyze context: Function context metadata

Returns:

The value after verification (may be unchanged, sanitized, or raise exception)

async function_middleware_invoke(
*args: Any,
call_next: nat.middleware.function_middleware.CallNext,
context: nat.middleware.middleware.FunctionMiddlewareContext,
\*\*kwargs: Any,
) Any#

Apply pre-tool verification to function invocation.

Analyzes function inputs for instruction violations before calling the tool.

Args:

args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.

Returns:

Function output (tool may not be called if input is refused).

async function_middleware_stream(
*args: Any,
call_next: nat.middleware.function_middleware.CallNextStream,
context: nat.middleware.middleware.FunctionMiddlewareContext,
\*\*kwargs: Any,
) collections.abc.AsyncIterator[Any]#

Apply pre-tool verification to streaming function.

Analyzes function inputs for instruction violations before calling the tool. Since verification happens on the input (before the call), streaming behavior of the output is unaffected after verification passes.

Args:

args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.

Yields:

Function output chunks (tool may not be called if input is refused).