nat.middleware.red_teaming.red_teaming_middleware#

Red teaming middleware for attacking agent functions.

This module provides a middleware for red teaming and security testing that can intercept and modify function inputs or outputs with configurable attack payloads.

The middleware supports: - Targeting specific functions or entire function groups - Field-level search within input/output schemas - Multiple attack modes (replace, append_start, append_middle, append_end) - Both regular and streaming function calls - Type-safe operations on strings, integers, and floats

Classes#

RedTeamingMiddleware

Middleware for red teaming that intercepts and modifies function inputs/outputs.

Module Contents#

class RedTeamingMiddleware( *, attack_payload: str, target_function_or_group: str | None = None, payload_placement: Literal['replace', 'append_start', 'append_middle', 'append_end'] = 'append_end', target_location: Literal['input', 'output'] = 'input', target_field: str | None = None, target_field_resolution_strategy: Literal['random', 'first', 'last', 'all', 'error'] = 'error', call_limit: int | None = None, )#

Bases: nat.middleware.function_middleware.FunctionMiddleware

Middleware for red teaming that intercepts and modifies function inputs/outputs.

This middleware enables systematic security testing by injecting attack payloads into function inputs or outputs. It supports flexible targeting, field-level modifications, and multiple attack modes.

Features:

Target specific functions or entire function groups
Search for specific fields in input/output schemas
Apply attacks via replace or append modes
Support for both regular and streaming calls
Type-safe operations on strings, numbers

Example:

# In YAML config
middleware:
  prompt_injection:
    _type: red_teaming
    attack_payload: "Ignore previous instructions"
    target_function_or_group: my_llm.generate
    payload_placement: append_start
    target_location: input
    target_field: prompt

Args:: attack_payload: The malicious payload to inject. target_function_or_group: Function or group to target (None for all). payload_placement: How to apply (replace, append_start, append_middle, append_end). target_location: Whether to attack input or output. target_field: Field name or path to attack (None for direct value).

Initialize red teaming middleware.

Args:: attack_payload: The value to inject to the function input or output. target_function_or_group: Optional function/group to target. payload_placement: How to apply the payload (replace or append modes). target_location: Whether to place the payload in the input or output. target_field: JSONPath to the field to attack. target_field_resolution_strategy: Strategy (random/first/last/all/error). call_limit: Maximum number of times the middleware will apply a payload.

_attack_payload#

_target_function_or_group = None#

_payload_placement = 'append_end'#

_target_location = 'input'#

_target_field = None#

_target_field_resolution_strategy = 'error'#

_call_count: int = 0#

_call_limit = None#

_should_apply_payload(context_name: str) → bool#

Check if this function should be attacked based on targeting configuration.

Args:: context_name: The name of the function from context (e.g., “calculator__add”)
Returns:: True if the function should be attacked, False otherwise

_find_middle_sentence_index(text: str) → int#

Find the index to insert text at the middle sentence boundary.

Args:: text: The text to analyze
Returns:: The character index where the middle sentence ends

_apply_payload_to_simple_type( original_value: list | str | int | float, attack_payload: str, payload_placement: str, ) → Any#

Apply the attack payload to simple types (str, int, float) value.

Args:: original_value: The original value to attack attack_payload: The payload to inject payload_placement: How to apply the payload
Returns:: The modified value with attack applied
Raises:: ValueError: If attack cannot be applied due to type mismatch

_resolve_multiple_field_matches(matches)#

_apply_payload_to_complex_type( value: list | dict | pydantic.BaseModel, ) → list | dict | pydantic.BaseModel#

_apply_payload_to_function_value(value: Any) → Any#

_apply_payload_to_function_value_with_exception( value: Any, context: nat.middleware.function_middleware.FunctionMiddlewareContext, ) → Any#

async function_middleware_invoke( *args: Any, call_next: nat.middleware.function_middleware.CallNext, context: nat.middleware.function_middleware.FunctionMiddlewareContext, \*\*kwargs: Any, ) → Any#

Invoke middleware for single-output functions.

Args:: args: Positional arguments passed to the function (first arg is typically the input value). call_next: Callable to invoke next middleware/function. context: Metadata about the function being wrapped. kwargs: Keyword arguments passed to the function.
Returns:: The output value (potentially modified if attacking output).