nat.middleware.red_teaming.red_teaming_middleware#
Red teaming middleware for attacking agent functions.
This module provides a middleware for red teaming and security testing that can intercept and modify function inputs or outputs with configurable attack payloads.
The middleware supports: - Targeting specific functions or entire function groups - Field-level search within input/output schemas - Multiple attack modes (replace, append_start, append_middle, append_end) - Both regular and streaming function calls - Type-safe operations on strings, integers, and floats
Classes#
Middleware for red teaming that intercepts and modifies function inputs/outputs. |
Module Contents#
- class RedTeamingMiddleware(
- *,
- attack_payload: str,
- target_function_or_group: str | None = None,
- payload_placement: Literal['replace', 'append_start', 'append_middle', 'append_end'] = 'append_end',
- target_location: Literal['input', 'output'] = 'input',
- target_field: str | None = None,
- target_field_resolution_strategy: Literal['random', 'first', 'last', 'all', 'error'] = 'error',
- call_limit: int | None = None,
Bases:
nat.middleware.function_middleware.FunctionMiddlewareMiddleware for red teaming that intercepts and modifies function inputs/outputs.
This middleware enables systematic security testing by injecting attack payloads into function inputs or outputs. It supports flexible targeting, field-level modifications, and multiple attack modes.
Features:
Target specific functions or entire function groups
Search for specific fields in input/output schemas
Apply attacks via replace or append modes
Support for both regular and streaming calls
Type-safe operations on strings, numbers
Example:
# In YAML config middleware: prompt_injection: _type: red_teaming attack_payload: "Ignore previous instructions" target_function_or_group: my_llm.generate payload_placement: append_start target_location: input target_field: prompt
- Args:
attack_payload: The malicious payload to inject. target_function_or_group: Function or group to target (None for all). payload_placement: How to apply (replace, append_start, append_middle, append_end). target_location: Whether to attack input or output. target_field: Field name or path to attack (None for direct value).
Initialize red teaming middleware.
- Args:
attack_payload: The value to inject to the function input or output. target_function_or_group: Optional function/group to target. payload_placement: How to apply the payload (replace or append modes). target_location: Whether to place the payload in the input or output. target_field: JSONPath to the field to attack. target_field_resolution_strategy: Strategy (random/first/last/all/error). call_limit: Maximum number of times the middleware will apply a payload.
- _attack_payload#
- _target_function_or_group = None#
- _payload_placement = 'append_end'#
- _target_location = 'input'#
- _target_field = None#
- _target_field_resolution_strategy = 'error'#
- _call_limit = None#
- _should_apply_payload(context_name: str) bool#
Check if this function should be attacked based on targeting configuration.
- Args:
context_name: The name of the function from context (e.g., “calculator__add”)
- Returns:
True if the function should be attacked, False otherwise
- _find_middle_sentence_index(text: str) int#
Find the index to insert text at the middle sentence boundary.
- Args:
text: The text to analyze
- Returns:
The character index where the middle sentence ends
- _apply_payload_to_simple_type( ) Any#
Apply the attack payload to simple types (str, int, float) value.
- Args:
original_value: The original value to attack attack_payload: The payload to inject payload_placement: How to apply the payload
- Returns:
The modified value with attack applied
- Raises:
ValueError: If attack cannot be applied due to type mismatch
- _resolve_multiple_field_matches(matches)#
- _apply_payload_to_function_value(value: Any) Any#
- _apply_payload_to_function_value_with_exception(
- value: Any,
- context: nat.middleware.function_middleware.FunctionMiddlewareContext,
- async function_middleware_invoke(
- *args: Any,
- call_next: nat.middleware.function_middleware.CallNext,
- context: nat.middleware.function_middleware.FunctionMiddlewareContext,
- \*\*kwargs: Any,
Invoke middleware for single-output functions.
- Args:
args: Positional arguments passed to the function (first arg is typically the input value). call_next: Callable to invoke next middleware/function. context: Metadata about the function being wrapped. kwargs: Keyword arguments passed to the function.
- Returns:
The output value (potentially modified if attacking output).