nat.middleware.defense.defense_middleware_content_guard#
Content Safety Guard Middleware.
This middleware uses guard models to classify content as safe or harmful with simple Yes/No answers.
Attributes#
Classes#
Configuration for Content Safety Guard middleware. |
|
Safety guard middleware using guard models to classify content as safe or unsafe. |
Module Contents#
- logger#
- class ContentSafetyGuardMiddlewareConfig(/, **data: Any)#
Bases:
nat.middleware.defense.defense_middleware.DefenseMiddlewareConfigConfiguration for Content Safety Guard middleware.
This middleware uses guard models to classify content as safe or harmful.
Actions: partial_compliance (log warning but allow), refusal (block content), or redirection (replace with polite refusal message).
Note: Only output analysis is currently supported (target_location=’output’).
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.
- class ContentSafetyGuardMiddleware(
- config: ContentSafetyGuardMiddlewareConfig,
- builder,
Bases:
nat.middleware.defense.defense_middleware.DefenseMiddlewareSafety guard middleware using guard models to classify content as safe or unsafe.
This middleware analyzes content using guard models (e.g., NVIDIA Nemoguard, Qwen Guard) that return “Safe” or “Unsafe” classifications. The middleware extracts safety categories when unsafe content is detected.
Only output analysis is currently supported (
target_location='output').- Streaming Behavior:
For ‘refusal’ and ‘redirection’ actions, chunks are buffered and checked before yielding to prevent unsafe content from being streamed to clients. For ‘partial_compliance’ action, chunks are yielded immediately; violations are logged but content passes through.
Initialize content safety guard middleware.
- Args:
config: Configuration for content safety guard middleware builder: Builder instance for loading LLMs
- _llm = None#
- async _get_llm()#
Lazy load the guard model LLM when first needed.
- _extract_unsafe_categories( ) list[str]#
Extract safety categories only if content is unsafe.
Supports both JSON formats (Safety Categories field) and text formats (Categories: line).
- Args:
response_text: Raw response from guard model. is_safe: Whether the content was detected as safe.
- Returns:
List of category strings if unsafe, empty list otherwise or on parsing error.
- _parse_guard_response(
- response_text: str,
Parse guard model response.
Searches for Safe or Unsafe keywords anywhere in the response (case-insensitive). Works with any guard model format (JSON, structured text, or plain text). Also extracts safety categories from both JSON and text formats. If neither keyword is found, falls back to implicit refusal detection. Prioritizes Unsafe if both keywords are present.
- Args:
response_text: Raw response from guard model.
- Returns:
GuardResponseResult with is_safe boolean, categories list, and raw response.
- _should_refuse( ) bool#
Determine if content should be refused.
- Args:
parsed_result: Result from _parse_guard_response.
- Returns:
True if content should be refused.
- async _analyze_content(
- content: Any,
- original_input: Any = None,
- context: nat.middleware.middleware.FunctionMiddlewareContext | None = None,
Check content safety using guard model.
- Args:
content: The content to analyze original_input: The original input to the function (for context) context: Function metadata
- Returns:
Safety classification result with should_refuse flag
- async _handle_threat(
- content: Any,
- analysis_result: nat.middleware.defense.defense_middleware_data_models.ContentAnalysisResult,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
Handle unsafe content based on configured action.
- Args:
content: The unsafe content analysis_result: Safety classification result. context: Function context
- Returns:
Handled content (blocked, sanitized, or original)
- async _process_content_safety_detection(
- value: Any,
- location: str,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
- original_input: Any = None,
Process content safety detection and handling for a given value.
Handles field extraction, content safety analysis, threat handling, and applying sanitized value back to original structure.
- Args:
value: The value to analyze (input or output). location: Either input or output (for logging). context: Function context metadata. original_input: Original function input (for output analysis context).
- Returns:
The value after content safety handling (may be unchanged, sanitized, or raise).
- async function_middleware_invoke(
- *args: Any,
- call_next: nat.middleware.function_middleware.CallNext,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
- \*\*kwargs: Any,
Apply content safety guard check to function invocation.
This is the core logic for content safety guard defense - each defense implements its own invoke/stream based on its specific strategy.
- Args:
args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata (provides context state). kwargs: Keyword arguments passed to the function.
- Returns:
Function output (potentially blocked or sanitized).
- async function_middleware_stream(
- *args: Any,
- call_next: nat.middleware.function_middleware.CallNextStream,
- context: nat.middleware.middleware.FunctionMiddlewareContext,
- \*\*kwargs: Any,
Apply content safety guard check to streaming function.
For ‘refusal’ and ‘redirection’ actions: Chunks are buffered and checked before yielding. For ‘partial_compliance’ action: Chunks are yielded immediately; violations are logged.
- Args:
args: Positional arguments passed to the function (first arg is typically the input value). call_next: Next middleware/function to call. context: Function metadata. kwargs: Keyword arguments passed to the function.
- Yields:
Function output chunks (potentially blocked or sanitized).