nat.eval.runners.red_teaming_runner.runner#
Red teaming runner for executing multi-scenario red teaming evaluations.
Classes#
Runner for executing red teaming evaluations across multiple scenarios. |
Module Contents#
- class RedTeamingRunner(
- config: nat.eval.runners.red_teaming_runner.config.RedTeamingRunnerConfig | None,
- base_workflow_config: nat.data_models.config.Config,
- dataset_path: str | None = None,
- result_json_path: str = '$',
- endpoint: str | None = None,
- endpoint_timeout: int = 300,
- reps: int = 1,
- overrides: tuple[tuple[str, str], Ellipsis] = (),
Runner for executing red teaming evaluations across multiple scenarios.
This runner encapsulates all the logic for:
Generating workflow configurations for each scenario
Setting up output directories
Saving configuration files
Running evaluations via MultiEvaluationRunner
Example usage:
runner = RedTeamingRunner( config=rt_config, base_workflow_config=base_workflow_config, dataset_path="/path/to/dataset.json", ) results = await runner.run()
Initialize the RedTeamingRunner.
- Args:
config: Red teaming config with scenarios (None uses base_workflow_config). base_workflow_config: Base workflow config to transform for each scenario. dataset_path: Optional dataset path (overrides config dataset). result_json_path: JSON path to extract the result from the workflow. endpoint: Optional endpoint URL for running the workflow. endpoint_timeout: HTTP response timeout in seconds. reps: Number of repetitions for the evaluation. overrides: Config overrides using dot notation (path, value) tuples.
- config#
- base_workflow_config#
- dataset_path = None#
- result_json_path = '$'#
- endpoint = None#
- endpoint_timeout = 300#
- reps = 1#
- overrides = ()#
- _generated_workflow_configs: dict[str, nat.data_models.config.Config] | None = None#
- _base_output_dir: pathlib.Path | None = None#
- async run() dict[str, nat.eval.config.EvaluationRunOutput]#
Run the red teaming evaluation across all scenarios.
- Returns:
Dictionary mapping scenario_id to EvaluationRunOutput.
- Raises:
ValueError: If configuration validation fails.
- generate_workflow_configs() dict[str, nat.data_models.config.Config]#
Generate workflow configurations for each scenario.
If config is None, returns the base_workflow_config as a single scenario after validating it has the required red teaming components.
- Returns:
Dictionary mapping scenario_id to the transformed Config.
- Raises:
ValueError: If validation fails.
- setup_output_directory(
- generated_workflow_configs: dict[str, nat.data_models.config.Config],
Set up the base output directory.
If the directory already exists, creates a new directory with a timestamp and unique identifier suffix.
- Args:
generated_workflow_configs: The generated workflow configs per scenario.
- Returns:
The base output directory path.
- save_configs(
- base_output_dir: pathlib.Path,
- generated_workflow_configs: dict[str, nat.data_models.config.Config],
Save base workflow config, red team config, and scenario workflow configs to disk.
- Args:
base_output_dir: The base output directory. generated_workflow_configs: The generated workflow configs per scenario.
- _apply_overrides_to_all(
- generated_workflow_configs: dict[str, nat.data_models.config.Config],
Apply CLI overrides to all scenario configs.
- Args:
scenario_configs: The scenario configurations to modify.
- Returns:
The modified scenario configurations.
- _build_evaluation_configs(
- base_output_dir: pathlib.Path,
- scenario_configs: dict[str, nat.data_models.config.Config],
Build EvaluationRunConfig for each scenario.
- Args:
base_output_dir: The base output directory. scenario_configs: The generated scenario configurations.
- Returns:
Dictionary mapping scenario_id to EvaluationRunConfig.
- Raises:
ValueError: If config validation fails.
- _validate_base_config_for_direct_use(
- base_workflow_config: nat.data_models.config.Config,
Validate that a workflow config is compatible with red teaming.
A workflow config is compatible if it contains: - At least one RedTeamingMiddleware (or subclass) - At least one red_teaming_evaluator
This is used when the user provides a pre-configured workflow instead of a RedTeamingRunnerConfig.
- Args:
base_workflow_config: The workflow configuration to validate.
- Raises:
ValueError: If the config is not red-team compatible.
- _warn_about_other_evaluators(
- base_workflow_config: nat.data_models.config.Config,
Warn if the base workflow config contains other evaluators.
Red teaming evaluation is potentially incompatible with other evaluators due to its adversarial nature.
- Args:
base_workflow_config: The base workflow configuration to validate.
- _validate_dataset_exists(
- base_workflow_config: nat.data_models.config.Config,
- dataset_path: str | None,
Validate that a dataset is defined somewhere.
Dataset can be defined in: - CLI –dataset argument (dataset_path) - RedTeamingRunnerConfig.general.dataset - base_workflow_config.eval.general.dataset
- Args:
base_workflow_config: The base workflow configuration. dataset_path: Optional dataset path from CLI.
- Raises:
ValueError: If no dataset is defined anywhere.
- _merge_general_config(
- base_workflow_config_dict: dict[str, Any],
- general: nat.data_models.evaluate.EvalGeneralConfig,
Merge general eval settings into the base workflow config dict.
This performs a union of the base workflow’s eval.general with the RedTeamingRunnerConfig.general, where RedTeamingRunnerConfig values take precedence. Only explicitly set values override base values.
- Args:
base_workflow_config_dict: The configuration dictionary to modify (in place). general: The EvalGeneralConfig from RedTeamingRunnerConfig.
- _attach_middleware_everywhere( ) None#
Attach middleware to all functions, function_groups, and workflow.
The middleware’s internal target_function_or_group handles runtime activation - this just ensures the middleware is registered everywhere.
- Args:
base_workflow_config_dict: The configuration dictionary to modify (in place). middleware_name: Name of the middleware to attach.
- _inject_evaluator_config(
- base_workflow_config_dict: dict[str, Any],
- scenario: nat.eval.runners.red_teaming_runner.config.RedTeamingScenario,
Inject the evaluator configuration into the workflow config.
Creates a red_teaming_evaluator in the eval section using the complete evaluator configuration from the scenario.
- Args:
base_workflow_config_dict: The configuration dictionary to modify (in place). scenario: The scenario containing the complete evaluator config.
- _update_config_value( ) None#
Update a single value in the scenario config dictionary at the specified path.
- Args:
scenario_config_dict: The scenario configuration dictionary to update. path: The path to the value to update. value: The new value to set at the specified path.
- _find_red_teaming_evaluator_results(
- results: dict[str, nat.eval.config.EvaluationRunOutput],
Find the red teaming evaluator results in the results.
- Args:
results: The results of the red teaming evaluation.
- Returns:
The red teaming evaluator results.
- _compute_result_summary(df: pandas.DataFrame) dict[str, Any]#
Compute the result summary for the red teaming evaluation using pandas.
Filters out rows with errors (error_message is not None) for reliable score computations. Also computes attack success rate (% of instances where score > 0.5 threshold).
- Args:
df: DataFrame with flattened evaluation results.
- Returns:
The result summary dictionary.
- _log_results_summary(
- summary: dict[str, Any],
- output_dir: pathlib.Path,
- results_file: pathlib.Path | None = None,
- report_path: pathlib.Path | None = None,
Log a nicely formatted summary of the red teaming evaluation results.
- Args:
summary: The computed summary dictionary with overall_score and per_scenario_summary. output_dir: The base output directory where results are saved. results_file: Optional path to the flat results JSONL file. report_path: Optional path to the HTML report.
- _build_flat_results(
- results: dict[str, nat.eval.config.EvaluationRunOutput],
Build a flat list of dictionaries from nested evaluation results.
Each record represents a single condition evaluation, with a unique identifier combining scenario_id, item_id, and condition_name.
- Args:
results: The nested results from the red teaming evaluation.
- Returns:
A list of flat dictionaries, one per condition evaluation.
- _save_flat_results(
- flat_results: list[dict[str, Any]],
- output_dir: pathlib.Path,
Save flat results to a JSONL file.
- Args:
flat_results: The flat list of result dictionaries. output_dir: The directory to save the file to.
- Returns:
The path to the saved JSONL file.