nat.eval.runners.red_teaming_runner.runner#

Red teaming runner for executing multi-scenario red teaming evaluations.

Classes#

RedTeamingRunner

Runner for executing red teaming evaluations across multiple scenarios.

Module Contents#

class RedTeamingRunner(
config: nat.eval.runners.red_teaming_runner.config.RedTeamingRunnerConfig | None,
base_workflow_config: nat.data_models.config.Config,
dataset_path: str | None = None,
result_json_path: str = '$',
endpoint: str | None = None,
endpoint_timeout: int = 300,
reps: int = 1,
overrides: tuple[tuple[str, str], Ellipsis] = (),
)#

Runner for executing red teaming evaluations across multiple scenarios.

This runner encapsulates all the logic for:

  • Generating workflow configurations for each scenario

  • Setting up output directories

  • Saving configuration files

  • Running evaluations via MultiEvaluationRunner

Example usage:

runner = RedTeamingRunner(
    config=rt_config,
    base_workflow_config=base_workflow_config,
    dataset_path="/path/to/dataset.json",
)
results = await runner.run()

Initialize the RedTeamingRunner.

Args:

config: Red teaming config with scenarios (None uses base_workflow_config). base_workflow_config: Base workflow config to transform for each scenario. dataset_path: Optional dataset path (overrides config dataset). result_json_path: JSON path to extract the result from the workflow. endpoint: Optional endpoint URL for running the workflow. endpoint_timeout: HTTP response timeout in seconds. reps: Number of repetitions for the evaluation. overrides: Config overrides using dot notation (path, value) tuples.

config#
base_workflow_config#
dataset_path = None#
result_json_path = '$'#
endpoint = None#
endpoint_timeout = 300#
reps = 1#
overrides = ()#
_generated_workflow_configs: dict[str, nat.data_models.config.Config] | None = None#
_base_output_dir: pathlib.Path | None = None#
async run() dict[str, nat.eval.config.EvaluationRunOutput]#

Run the red teaming evaluation across all scenarios.

Returns:

Dictionary mapping scenario_id to EvaluationRunOutput.

Raises:

ValueError: If configuration validation fails.

generate_workflow_configs() dict[str, nat.data_models.config.Config]#

Generate workflow configurations for each scenario.

If config is None, returns the base_workflow_config as a single scenario after validating it has the required red teaming components.

Returns:

Dictionary mapping scenario_id to the transformed Config.

Raises:

ValueError: If validation fails.

setup_output_directory(
generated_workflow_configs: dict[str, nat.data_models.config.Config],
) pathlib.Path#

Set up the base output directory.

If the directory already exists, creates a new directory with a timestamp and unique identifier suffix.

Args:

generated_workflow_configs: The generated workflow configs per scenario.

Returns:

The base output directory path.

save_configs(
base_output_dir: pathlib.Path,
generated_workflow_configs: dict[str, nat.data_models.config.Config],
) None#

Save base workflow config, red team config, and scenario workflow configs to disk.

Args:

base_output_dir: The base output directory. generated_workflow_configs: The generated workflow configs per scenario.

_apply_overrides_to_all(
generated_workflow_configs: dict[str, nat.data_models.config.Config],
) dict[str, nat.data_models.config.Config]#

Apply CLI overrides to all scenario configs.

Args:

scenario_configs: The scenario configurations to modify.

Returns:

The modified scenario configurations.

_build_evaluation_configs(
base_output_dir: pathlib.Path,
scenario_configs: dict[str, nat.data_models.config.Config],
) dict[str, nat.eval.config.EvaluationRunConfig]#

Build EvaluationRunConfig for each scenario.

Args:

base_output_dir: The base output directory. scenario_configs: The generated scenario configurations.

Returns:

Dictionary mapping scenario_id to EvaluationRunConfig.

Raises:

ValueError: If config validation fails.

_validate_base_config_for_direct_use(
base_workflow_config: nat.data_models.config.Config,
) None#

Validate that a workflow config is compatible with red teaming.

A workflow config is compatible if it contains: - At least one RedTeamingMiddleware (or subclass) - At least one red_teaming_evaluator

This is used when the user provides a pre-configured workflow instead of a RedTeamingRunnerConfig.

Args:

base_workflow_config: The workflow configuration to validate.

Raises:

ValueError: If the config is not red-team compatible.

_warn_about_other_evaluators(
base_workflow_config: nat.data_models.config.Config,
) None#

Warn if the base workflow config contains other evaluators.

Red teaming evaluation is potentially incompatible with other evaluators due to its adversarial nature.

Args:

base_workflow_config: The base workflow configuration to validate.

_validate_dataset_exists(
base_workflow_config: nat.data_models.config.Config,
dataset_path: str | None,
) None#

Validate that a dataset is defined somewhere.

Dataset can be defined in: - CLI –dataset argument (dataset_path) - RedTeamingRunnerConfig.general.dataset - base_workflow_config.eval.general.dataset

Args:

base_workflow_config: The base workflow configuration. dataset_path: Optional dataset path from CLI.

Raises:

ValueError: If no dataset is defined anywhere.

_merge_general_config(
base_workflow_config_dict: dict[str, Any],
general: nat.data_models.evaluate.EvalGeneralConfig,
) None#

Merge general eval settings into the base workflow config dict.

This performs a union of the base workflow’s eval.general with the RedTeamingRunnerConfig.general, where RedTeamingRunnerConfig values take precedence. Only explicitly set values override base values.

Args:

base_workflow_config_dict: The configuration dictionary to modify (in place). general: The EvalGeneralConfig from RedTeamingRunnerConfig.

_attach_middleware_everywhere(
base_workflow_config_dict: dict[str, Any],
middleware_name: str,
) None#

Attach middleware to all functions, function_groups, and workflow.

The middleware’s internal target_function_or_group handles runtime activation - this just ensures the middleware is registered everywhere.

Args:

base_workflow_config_dict: The configuration dictionary to modify (in place). middleware_name: Name of the middleware to attach.

_inject_evaluator_config(
base_workflow_config_dict: dict[str, Any],
scenario: nat.eval.runners.red_teaming_runner.config.RedTeamingScenario,
) None#

Inject the evaluator configuration into the workflow config.

Creates a red_teaming_evaluator in the eval section using the complete evaluator configuration from the scenario.

Args:

base_workflow_config_dict: The configuration dictionary to modify (in place). scenario: The scenario containing the complete evaluator config.

_update_config_value(
scenario_config_dict: dict[str, Any],
path: str,
value: Any,
) None#

Update a single value in the scenario config dictionary at the specified path.

Args:

scenario_config_dict: The scenario configuration dictionary to update. path: The path to the value to update. value: The new value to set at the specified path.

_find_red_teaming_evaluator_results(
results: dict[str, nat.eval.config.EvaluationRunOutput],
) dict[str, nat.eval.evaluator.evaluator_model.EvalOutput]#

Find the red teaming evaluator results in the results.

Args:

results: The results of the red teaming evaluation.

Returns:

The red teaming evaluator results.

_compute_result_summary(df: pandas.DataFrame) dict[str, Any]#

Compute the result summary for the red teaming evaluation using pandas.

Filters out rows with errors (error_message is not None) for reliable score computations. Also computes attack success rate (% of instances where score > 0.5 threshold).

Args:

df: DataFrame with flattened evaluation results.

Returns:

The result summary dictionary.

_log_results_summary(
summary: dict[str, Any],
output_dir: pathlib.Path,
results_file: pathlib.Path | None = None,
report_path: pathlib.Path | None = None,
) None#

Log a nicely formatted summary of the red teaming evaluation results.

Args:

summary: The computed summary dictionary with overall_score and per_scenario_summary. output_dir: The base output directory where results are saved. results_file: Optional path to the flat results JSONL file. report_path: Optional path to the HTML report.

_build_flat_results(
results: dict[str, nat.eval.config.EvaluationRunOutput],
) list[dict[str, Any]]#

Build a flat list of dictionaries from nested evaluation results.

Each record represents a single condition evaluation, with a unique identifier combining scenario_id, item_id, and condition_name.

Args:

results: The nested results from the red teaming evaluation.

Returns:

A list of flat dictionaries, one per condition evaluation.

_save_flat_results(
flat_results: list[dict[str, Any]],
output_dir: pathlib.Path,
) pathlib.Path#

Save flat results to a JSONL file.

Args:

flat_results: The flat list of result dictionaries. output_dir: The directory to save the file to.

Returns:

The path to the saved JSONL file.