nat.middleware.defense.defense_middleware_output_verifier#

Output Verifier Defense Middleware.

This middleware uses an LLM to verify function outputs for correctness and security. It can detect incorrect results, malicious content, and provide corrections automatically.

Attributes#

Classes#

OutputVerifierMiddlewareConfig

Configuration for Output Verifier middleware.

OutputVerifierMiddleware

Verification middleware using an LLM for correctness and security.

Module Contents#

logger#
class OutputVerifierMiddlewareConfig(/, **data: Any)#

Bases: nat.middleware.defense.defense_middleware.DefenseMiddlewareConfig

Configuration for Output Verifier middleware.

This middleware analyzes function outputs using an LLM to verify correctness, detect security threats, and provide corrections when needed.

Actions: - ‘partial_compliance’: Detect and log threats, but allow content to pass through - ‘refusal’: Block output if threat detected (hard stop) - ‘redirection’: Replace incorrect output with correct answer from LLM

Note: Only output analysis is currently supported (target_location=’output’).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

llm_name: str = None#
threshold: float = None#
tool_description: str | None = None#
class OutputVerifierMiddleware(
config: OutputVerifierMiddlewareConfig,
builder,
)#

Bases: nat.middleware.defense.defense_middleware.DefenseMiddleware

Verification middleware using an LLM for correctness and security.

This middleware uses NAT’s LLM system to verify function outputs for:

  • Correctness and reasonableness

  • Security validation (detecting malicious content and manipulated values)

  • Providing automatic corrections when errors are detected

Only output analysis is currently supported (target_location='output').

Streaming Behavior:

For ‘refusal’ and ‘redirection’ actions, chunks are buffered and checked before yielding to prevent incorrect content from being streamed to clients. For ‘partial_compliance’ action, chunks are yielded immediately; violations are logged but content passes through.

Initialize output verifier middleware.

Args:

config: Configuration for output verifier middleware builder: Builder instance for loading LLMs

config: OutputVerifierMiddlewareConfig#
_llm = None#
async _get_llm()#

Lazy load the LLM when first needed.

_extract_json_from_response(response_text: str) str#

Extract JSON from LLM response, handling markdown code blocks.

Args:

response_text: Raw response from LLM

Returns:

Extracted JSON string

async _analyze_content(
content: Any,
content_type: str,
inputs: Any = None,
function_name: str | None = None,
) nat.middleware.defense.defense_middleware_data_models.OutputVerificationResult#

Check content for threats using the configured LLM.

Args:

content: The content to analyze content_type: Either ‘input’ or ‘output’ (for logging only) inputs: Optional function inputs for context (helps LLM calculate correct answers) function_name: Name of the function being verified (for context)

Returns:

OutputVerificationResult with threat detection info and should_refuse flag.

async _handle_threat(
content: Any,
analysis_result: nat.middleware.defense.defense_middleware_data_models.OutputVerificationResult,
context: nat.middleware.middleware.FunctionMiddlewareContext,
) Any#

Handle detected threat based on configured action.

Args:

content: The threatening content analysis_result: Detection result from LLM. context: Function context

Returns:

Handled content (blocked, sanitized/corrected, or original)

async _process_output_verification(
value: Any,
location: str,
context: nat.middleware.middleware.FunctionMiddlewareContext,
inputs: Any = None,
) Any#

Process output verification and handling for a given value.

This is a common helper method that handles: - Field extraction (if target_field is specified) - Output verification analysis - Threat handling (refusal, redirection, partial_compliance) - Applying corrected value back to original structure

Args:

value: The value to analyze (input or output) location: Either “input” or “output” (for logging) context: Function context metadata inputs: Original function inputs (for analysis context)

Returns:

The value after output verification handling (may be unchanged, corrected, or raise exception)

async function_middleware_invoke(
*args: Any,
call_next: nat.middleware.function_middleware.CallNext,
context: nat.middleware.middleware.FunctionMiddlewareContext,
\*\*kwargs: Any,
) Any#

Apply output verifier to function invocation.

Analyzes function outputs for correctness and security, with auto-correction.

Args:

args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.

Returns:

Function output (potentially corrected, blocked, or sanitized).

async function_middleware_stream(
*args: Any,
call_next: nat.middleware.function_middleware.CallNextStream,
context: nat.middleware.middleware.FunctionMiddlewareContext,
\*\*kwargs: Any,
) collections.abc.AsyncIterator[Any]#

Apply output verifier to streaming function.

For ‘refusal’ and ‘redirection’ actions: Chunks are buffered and checked before yielding. For ‘partial_compliance’ action: Chunks are yielded immediately; violations are logged.

Args:

args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.

Yields:

Function output chunks (potentially corrected, blocked, or sanitized).