nat.middleware.red_teaming.red_teaming_middleware#

Red teaming middleware for attacking agent functions.

This module provides a middleware for red teaming and security testing that can intercept and modify function inputs or outputs with configurable attack payloads.

The middleware supports: - Targeting specific functions or entire function groups - Field-level search within input/output schemas - Multiple attack modes (replace, append_start, append_middle, append_end) - Both regular and streaming function calls - Type-safe operations on strings, integers, and floats

Classes#

RedTeamingMiddleware

Middleware for red teaming that intercepts and modifies function inputs/outputs.

Module Contents#

class RedTeamingMiddleware(
*,
attack_payload: str,
target_function_or_group: str | None = None,
payload_placement: Literal['replace', 'append_start', 'append_middle', 'append_end'] = 'append_end',
target_location: Literal['input', 'output'] = 'input',
target_field: str | None = None,
target_field_resolution_strategy: Literal['random', 'first', 'last', 'all', 'error'] = 'error',
call_limit: int | None = None,
)#

Bases: nat.middleware.function_middleware.FunctionMiddleware

Middleware for red teaming that intercepts and modifies function inputs/outputs.

This middleware enables systematic security testing by injecting attack payloads into function inputs or outputs. It supports flexible targeting, field-level modifications, and multiple attack modes.

Features:

  • Target specific functions or entire function groups

  • Search for specific fields in input/output schemas

  • Apply attacks via replace or append modes

  • Support for both regular and streaming calls

  • Type-safe operations on strings, numbers

Example:

# In YAML config
middleware:
  prompt_injection:
    _type: red_teaming
    attack_payload: "Ignore previous instructions"
    target_function_or_group: my_llm.generate
    payload_placement: append_start
    target_location: input
    target_field: prompt
Args:

attack_payload: The malicious payload to inject. target_function_or_group: Function or group to target (None for all). payload_placement: How to apply (replace, append_start, append_middle, append_end). target_location: Whether to attack input or output. target_field: Field name or path to attack (None for direct value).

Initialize red teaming middleware.

Args:

attack_payload: The value to inject to the function input or output. target_function_or_group: Optional function/group to target. payload_placement: How to apply the payload (replace or append modes). target_location: Whether to place the payload in the input or output. target_field: JSONPath to the field to attack. target_field_resolution_strategy: Strategy (random/first/last/all/error). call_limit: Maximum number of times the middleware will apply a payload.

_attack_payload#
_target_function_or_group = None#
_payload_placement = 'append_end'#
_target_location = 'input'#
_target_field = None#
_target_field_resolution_strategy = 'error'#
_call_count: int = 0#
_call_limit = None#
_should_apply_payload(context_name: str) bool#

Check if this function should be attacked based on targeting configuration.

Args:

context_name: The name of the function from context (e.g., “calculator__add”)

Returns:

True if the function should be attacked, False otherwise

_find_middle_sentence_index(text: str) int#

Find the index to insert text at the middle sentence boundary.

Args:

text: The text to analyze

Returns:

The character index where the middle sentence ends

_apply_payload_to_simple_type(
original_value: list | str | int | float,
attack_payload: str,
payload_placement: str,
) Any#

Apply the attack payload to simple types (str, int, float) value.

Args:

original_value: The original value to attack attack_payload: The payload to inject payload_placement: How to apply the payload

Returns:

The modified value with attack applied

Raises:

ValueError: If attack cannot be applied due to type mismatch

_resolve_multiple_field_matches(matches)#
_apply_payload_to_complex_type(
value: list | dict | pydantic.BaseModel,
) list | dict | pydantic.BaseModel#
_apply_payload_to_function_value(value: Any) Any#
_apply_payload_to_function_value_with_exception(
value: Any,
context: nat.middleware.function_middleware.FunctionMiddlewareContext,
) Any#
async function_middleware_invoke(
*args: Any,
call_next: nat.middleware.function_middleware.CallNext,
context: nat.middleware.function_middleware.FunctionMiddlewareContext,
\*\*kwargs: Any,
) Any#

Invoke middleware for single-output functions.

Args:

args: Positional arguments passed to the function (first arg is typically the input value). call_next: Callable to invoke next middleware/function. context: Metadata about the function being wrapped. kwargs: Keyword arguments passed to the function.

Returns:

The output value (potentially modified if attacking output).