nat.eval.runners.red_teaming_runner#

Submodules#

Classes#

`RedTeamingRunnerConfig`	Top-level configuration for red teaming evaluation.
`RedTeamingScenario`	A single red teaming scenario configuration.
`RedTeamingRunner`	Runner for executing red teaming evaluations across multiple scenarios.

Package Contents#

class RedTeamingRunnerConfig(/, **data: Any)#

Bases: pydantic.BaseModel

Top-level configuration for red teaming evaluation.

Supports evaluator_defaults for DRY configuration with _extends inheritance.

Attributes:: base_workflow: Optional path to the base workflow configuration file. llms: Dictionary of LLM configurations keyed by name. evaluator_defaults: Named evaluator configs that scenarios can extend. general: General evaluation settings (concurrency, output, dataset). scenarios: Dictionary of scenario configurations.

Example YAML configuration:

base_workflow: ./configs/base_workflow.yml

llms:
  judge_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct

evaluator_defaults:
  standard_eval:
    llm_name: judge_llm
    _type: red_teaming_evaluator
    judge_llm_prompt: >
      Evaluate whether the attack was successful...
    filter_conditions:
      - name: workflow_output
        event_type: FUNCTION_END
        payload_name: <workflow>
    reduction_strategy: last

general:
  max_concurrency: 4
  output_dir: ./.tmp/nat/redteaming/

scenarios:
  intercept_payload_42:
    middleware:
      _type: red_teaming
      target_function_or_group: my_calculator
      attack_payload: "42.0"
    evaluator:
      _extends: standard_eval
      scenario_specific_instructions: "Check for 42.0..."

  custom_scenario:
    tags: [category_1, category_2]
    middleware: {}
    evaluator:
      llm_name: judge_llm
      _type: red_teaming_evaluator
      judge_llm_prompt: "Custom prompt..."
      filter_conditions: []

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

base_workflow: pathlib.Path | None = None#

llms: dict[str, nat.data_models.llm.LLMBaseConfig] = None#

evaluator_defaults: dict[str, nat.eval.red_teaming_evaluator.register.RedTeamingEvaluatorConfig] | None = None#

general: nat.data_models.evaluate.EvalGeneralConfig | None = None#

scenarios: dict[str, RedTeamingScenario | _RedTeamingScenarioRaw] = None#

validate_and_resolve_scenarios() → RedTeamingRunnerConfig#

Validate scenarios and resolve _extends inheritance.

This runs after Pydantic parsing, so evaluator_defaults are already validated RedTeamingEvaluatorConfig objects. We convert any _RedTeamingScenarioRaw to RedTeamingScenario by resolving _extends.

Returns:: The validated configuration with all scenarios as RedTeamingScenario

classmethod rebuild_annotations() → bool#

Rebuild field annotations with discriminated unions.

This method updates the llms dict value annotation to use a discriminated union of all registered LLM providers. This allows Pydantic to correctly deserialize the _type field into the appropriate concrete LLM config class.

Returns:: True if the model was rebuilt, False otherwise.

class RedTeamingScenario(/, **data: Any)#

Bases: pydantic.BaseModel

A single red teaming scenario configuration.

Each scenario defines a complete middleware and evaluator configuration. The evaluator can use _extends to inherit from evaluator_defaults.

Attributes:

scenario_id: Optional unique identifier. If not provided, the dict key: from RedTeamingRunnerConfig.scenarios is used.
middleware: Full middleware configuration to apply. Set to None for: baseline scenarios (no middleware modification).
evaluator: Complete evaluator configuration. Can inherit from: evaluator_defaults using _extends in YAML/JSON.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

scenario_id: str | None = None#

middleware: nat.middleware.red_teaming.red_teaming_middleware_config.RedTeamingMiddlewareConfig | None = None#

evaluator: nat.eval.red_teaming_evaluator.register.RedTeamingEvaluatorConfig = None#

tags: list[str] = None#

scenario_group: str = None#

class RedTeamingRunner( config: nat.eval.runners.red_teaming_runner.config.RedTeamingRunnerConfig | None, base_workflow_config: nat.data_models.config.Config, dataset_path: str | None = None, result_json_path: str = '$', endpoint: str | None = None, endpoint_timeout: int = 300, reps: int = 1, overrides: tuple[tuple[str, str], Ellipsis] = (), )#

Runner for executing red teaming evaluations across multiple scenarios.

This runner encapsulates all the logic for:

Generating workflow configurations for each scenario
Setting up output directories
Saving configuration files
Running evaluations via MultiEvaluationRunner

Example usage:

runner = RedTeamingRunner(
    config=rt_config,
    base_workflow_config=base_workflow_config,
    dataset_path="/path/to/dataset.json",
)
results = await runner.run()

Initialize the RedTeamingRunner.

Args:: config: Red teaming config with scenarios (None uses base_workflow_config). base_workflow_config: Base workflow config to transform for each scenario. dataset_path: Optional dataset path (overrides config dataset). result_json_path: JSON path to extract the result from the workflow. endpoint: Optional endpoint URL for running the workflow. endpoint_timeout: HTTP response timeout in seconds. reps: Number of repetitions for the evaluation. overrides: Config overrides using dot notation (path, value) tuples.

config#

base_workflow_config#

dataset_path = None#

result_json_path = '$'#

endpoint = None#

endpoint_timeout = 300#

reps = 1#

overrides = ()#

_generated_workflow_configs: dict[str, nat.data_models.config.Config] | None = None#

_base_output_dir: pathlib.Path | None = None#

async run() → dict[str, nat.eval.config.EvaluationRunOutput]#

Run the red teaming evaluation across all scenarios.

Returns:: Dictionary mapping scenario_id to EvaluationRunOutput.
Raises:: ValueError: If configuration validation fails.

generate_workflow_configs() → dict[str, nat.data_models.config.Config]#

Generate workflow configurations for each scenario.

If config is None, returns the base_workflow_config as a single scenario after validating it has the required red teaming components.

Returns:: Dictionary mapping scenario_id to the transformed Config.
Raises:: ValueError: If validation fails.

setup_output_directory( generated_workflow_configs: dict[str, nat.data_models.config.Config], ) → pathlib.Path#

Set up the base output directory.

If the directory already exists, creates a new directory with a timestamp and unique identifier suffix.

Args:: generated_workflow_configs: The generated workflow configs per scenario.
Returns:: The base output directory path.

save_configs( base_output_dir: pathlib.Path, generated_workflow_configs: dict[str, nat.data_models.config.Config], ) → None#

Save base workflow config, red team config, and scenario workflow configs to disk.

Args:: base_output_dir: The base output directory. generated_workflow_configs: The generated workflow configs per scenario.

_apply_overrides_to_all( generated_workflow_configs: dict[str, nat.data_models.config.Config], ) → dict[str, nat.data_models.config.Config]#

Apply CLI overrides to all scenario configs.

Args:: scenario_configs: The scenario configurations to modify.
Returns:: The modified scenario configurations.

_build_evaluation_configs( base_output_dir: pathlib.Path, scenario_configs: dict[str, nat.data_models.config.Config], ) → dict[str, nat.eval.config.EvaluationRunConfig]#

Build EvaluationRunConfig for each scenario.

Args:: base_output_dir: The base output directory. scenario_configs: The generated scenario configurations.
Returns:: Dictionary mapping scenario_id to EvaluationRunConfig.
Raises:: ValueError: If config validation fails.

_validate_base_config_for_direct_use( base_workflow_config: nat.data_models.config.Config, ) → None#

Validate that a workflow config is compatible with red teaming.

A workflow config is compatible if it contains: - At least one RedTeamingMiddleware (or subclass) - At least one red_teaming_evaluator

This is used when the user provides a pre-configured workflow instead of a RedTeamingRunnerConfig.

Args:: base_workflow_config: The workflow configuration to validate.
Raises:: ValueError: If the config is not red-team compatible.

_warn_about_other_evaluators( base_workflow_config: nat.data_models.config.Config, ) → None#

Warn if the base workflow config contains other evaluators.

Red teaming evaluation is potentially incompatible with other evaluators due to its adversarial nature.

Args:: base_workflow_config: The base workflow configuration to validate.

_validate_dataset_exists( base_workflow_config: nat.data_models.config.Config, dataset_path: str | None, ) → None#

Validate that a dataset is defined somewhere.

Dataset can be defined in: - CLI –dataset argument (dataset_path) - RedTeamingRunnerConfig.general.dataset - base_workflow_config.eval.general.dataset

Args:: base_workflow_config: The base workflow configuration. dataset_path: Optional dataset path from CLI.
Raises:: ValueError: If no dataset is defined anywhere.

_merge_general_config( base_workflow_config_dict: dict[str, Any], general: nat.data_models.evaluate.EvalGeneralConfig, ) → None#

Merge general eval settings into the base workflow config dict.

This performs a union of the base workflow’s eval.general with the RedTeamingRunnerConfig.general, where RedTeamingRunnerConfig values take precedence. Only explicitly set values override base values.

Args:: base_workflow_config_dict: The configuration dictionary to modify (in place). general: The EvalGeneralConfig from RedTeamingRunnerConfig.

_attach_middleware_everywhere( base_workflow_config_dict: dict[str, Any], middleware_name: str, ) → None#

Attach middleware to all functions, function_groups, and workflow.

The middleware’s internal target_function_or_group handles runtime activation - this just ensures the middleware is registered everywhere.

Args:: base_workflow_config_dict: The configuration dictionary to modify (in place). middleware_name: Name of the middleware to attach.

_inject_evaluator_config( base_workflow_config_dict: dict[str, Any], scenario: nat.eval.runners.red_teaming_runner.config.RedTeamingScenario, ) → None#

Inject the evaluator configuration into the workflow config.

Creates a red_teaming_evaluator in the eval section using the complete evaluator configuration from the scenario.

Args:: base_workflow_config_dict: The configuration dictionary to modify (in place). scenario: The scenario containing the complete evaluator config.

_update_config_value( scenario_config_dict: dict[str, Any], path: str, value: Any, ) → None#

Update a single value in the scenario config dictionary at the specified path.

Args:: scenario_config_dict: The scenario configuration dictionary to update. path: The path to the value to update. value: The new value to set at the specified path.

_find_red_teaming_evaluator_results( results: dict[str, nat.eval.config.EvaluationRunOutput], ) → dict[str, nat.eval.evaluator.evaluator_model.EvalOutput]#

Find the red teaming evaluator results in the results.

Args:: results: The results of the red teaming evaluation.
Returns:: The red teaming evaluator results.

_compute_result_summary(df: pandas.DataFrame) → dict[str, Any]#

Compute the result summary for the red teaming evaluation using pandas.

Filters out rows with errors (error_message is not None) for reliable score computations. Also computes attack success rate (% of instances where score > 0.5 threshold).

Args:: df: DataFrame with flattened evaluation results.
Returns:: The result summary dictionary.

_log_results_summary( summary: dict[str, Any], output_dir: pathlib.Path, results_file: pathlib.Path | None = None, report_path: pathlib.Path | None = None, ) → None#

Log a nicely formatted summary of the red teaming evaluation results.

Args:: summary: The computed summary dictionary with overall_score and per_scenario_summary. output_dir: The base output directory where results are saved. results_file: Optional path to the flat results JSONL file. report_path: Optional path to the HTML report.

_build_flat_results( results: dict[str, nat.eval.config.EvaluationRunOutput], ) → list[dict[str, Any]]#

Build a flat list of dictionaries from nested evaluation results.

Each record represents a single condition evaluation, with a unique identifier combining scenario_id, item_id, and condition_name.

Args:: results: The nested results from the red teaming evaluation.
Returns:: A list of flat dictionaries, one per condition evaluation.

_save_flat_results( flat_results: list[dict[str, Any]], output_dir: pathlib.Path, ) → pathlib.Path#

Save flat results to a JSONL file.

Args:: flat_results: The flat list of result dictionaries. output_dir: The directory to save the file to.
Returns:: The path to the saved JSONL file.