nat.eval.runners.red_teaming_runner.runner#

Red teaming runner for executing multi-scenario red teaming evaluations.

Classes#

RedTeamingRunner

Runner for executing red teaming evaluations across multiple scenarios.

Module Contents#

class RedTeamingRunner( config: nat.eval.runners.red_teaming_runner.config.RedTeamingRunnerConfig | None, base_workflow_config: nat.data_models.config.Config, dataset_path: str | None = None, result_json_path: str = '$', endpoint: str | None = None, endpoint_timeout: int = 300, reps: int = 1, overrides: tuple[tuple[str, str], Ellipsis] = (), )#

Runner for executing red teaming evaluations across multiple scenarios.

This runner encapsulates all the logic for:

Generating workflow configurations for each scenario
Setting up output directories
Saving configuration files
Running evaluations via MultiEvaluationRunner

Example usage:

runner = RedTeamingRunner(
    config=rt_config,
    base_workflow_config=base_workflow_config,
    dataset_path="/path/to/dataset.json",
)
results = await runner.run()

Initialize the RedTeamingRunner.

Args:: config: Red teaming config with scenarios (None uses base_workflow_config). base_workflow_config: Base workflow config to transform for each scenario. dataset_path: Optional dataset path (overrides config dataset). result_json_path: JSON path to extract the result from the workflow. endpoint: Optional endpoint URL for running the workflow. endpoint_timeout: HTTP response timeout in seconds. reps: Number of repetitions for the evaluation. overrides: Config overrides using dot notation (path, value) tuples.

config#

base_workflow_config#

dataset_path = None#

result_json_path = '$'#

endpoint = None#

endpoint_timeout = 300#

reps = 1#

overrides = ()#

_generated_workflow_configs: dict[str, nat.data_models.config.Config] | None = None#

_base_output_dir: pathlib.Path | None = None#

async run() → dict[str, nat.eval.config.EvaluationRunOutput]#

Run the red teaming evaluation across all scenarios.

Returns:: Dictionary mapping scenario_id to EvaluationRunOutput.
Raises:: ValueError: If configuration validation fails.

generate_workflow_configs() → dict[str, nat.data_models.config.Config]#

Generate workflow configurations for each scenario.

If config is None, returns the base_workflow_config as a single scenario after validating it has the required red teaming components.

Returns:: Dictionary mapping scenario_id to the transformed Config.
Raises:: ValueError: If validation fails.

setup_output_directory( generated_workflow_configs: dict[str, nat.data_models.config.Config], ) → pathlib.Path#

Set up the base output directory.

If the directory already exists, creates a new directory with a timestamp and unique identifier suffix.

Args:: generated_workflow_configs: The generated workflow configs per scenario.
Returns:: The base output directory path.

save_configs( base_output_dir: pathlib.Path, generated_workflow_configs: dict[str, nat.data_models.config.Config], ) → None#

Save base workflow config, red team config, and scenario workflow configs to disk.

Args:: base_output_dir: The base output directory. generated_workflow_configs: The generated workflow configs per scenario.

_apply_overrides_to_all( generated_workflow_configs: dict[str, nat.data_models.config.Config], ) → dict[str, nat.data_models.config.Config]#

Apply CLI overrides to all scenario configs.

Args:: scenario_configs: The scenario configurations to modify.
Returns:: The modified scenario configurations.

_build_evaluation_configs( base_output_dir: pathlib.Path, scenario_configs: dict[str, nat.data_models.config.Config], ) → dict[str, nat.eval.config.EvaluationRunConfig]#

Build EvaluationRunConfig for each scenario.

Args:: base_output_dir: The base output directory. scenario_configs: The generated scenario configurations.
Returns:: Dictionary mapping scenario_id to EvaluationRunConfig.
Raises:: ValueError: If config validation fails.

_validate_base_config_for_direct_use( base_workflow_config: nat.data_models.config.Config, ) → None#

Validate that a workflow config is compatible with red teaming.

A workflow config is compatible if it contains: - At least one RedTeamingMiddleware (or subclass) - At least one red_teaming_evaluator

This is used when the user provides a pre-configured workflow instead of a RedTeamingRunnerConfig.

Args:: base_workflow_config: The workflow configuration to validate.
Raises:: ValueError: If the config is not red-team compatible.

_warn_about_other_evaluators( base_workflow_config: nat.data_models.config.Config, ) → None#

Warn if the base workflow config contains other evaluators.

Red teaming evaluation is potentially incompatible with other evaluators due to its adversarial nature.

Args:: base_workflow_config: The base workflow configuration to validate.

_validate_dataset_exists( base_workflow_config: nat.data_models.config.Config, dataset_path: str | None, ) → None#

Validate that a dataset is defined somewhere.

Dataset can be defined in: - CLI –dataset argument (dataset_path) - RedTeamingRunnerConfig.general.dataset - base_workflow_config.eval.general.dataset

Args:: base_workflow_config: The base workflow configuration. dataset_path: Optional dataset path from CLI.
Raises:: ValueError: If no dataset is defined anywhere.

_merge_general_config( base_workflow_config_dict: dict[str, Any], general: nat.data_models.evaluate.EvalGeneralConfig, ) → None#

Merge general eval settings into the base workflow config dict.

This performs a union of the base workflow’s eval.general with the RedTeamingRunnerConfig.general, where RedTeamingRunnerConfig values take precedence. Only explicitly set values override base values.

Args:: base_workflow_config_dict: The configuration dictionary to modify (in place). general: The EvalGeneralConfig from RedTeamingRunnerConfig.

_attach_middleware_everywhere( base_workflow_config_dict: dict[str, Any], middleware_name: str, ) → None#

Attach middleware to all functions, function_groups, and workflow.

The middleware’s internal target_function_or_group handles runtime activation - this just ensures the middleware is registered everywhere.

Args:: base_workflow_config_dict: The configuration dictionary to modify (in place). middleware_name: Name of the middleware to attach.

_inject_evaluator_config( base_workflow_config_dict: dict[str, Any], scenario: nat.eval.runners.red_teaming_runner.config.RedTeamingScenario, ) → None#

Inject the evaluator configuration into the workflow config.

Creates a red_teaming_evaluator in the eval section using the complete evaluator configuration from the scenario.

Args:: base_workflow_config_dict: The configuration dictionary to modify (in place). scenario: The scenario containing the complete evaluator config.

_update_config_value( scenario_config_dict: dict[str, Any], path: str, value: Any, ) → None#

Update a single value in the scenario config dictionary at the specified path.

Args:: scenario_config_dict: The scenario configuration dictionary to update. path: The path to the value to update. value: The new value to set at the specified path.

_find_red_teaming_evaluator_results( results: dict[str, nat.eval.config.EvaluationRunOutput], ) → dict[str, nat.eval.evaluator.evaluator_model.EvalOutput]#

Find the red teaming evaluator results in the results.

Args:: results: The results of the red teaming evaluation.
Returns:: The red teaming evaluator results.

_compute_result_summary(df: pandas.DataFrame) → dict[str, Any]#

Compute the result summary for the red teaming evaluation using pandas.

Filters out rows with errors (error_message is not None) for reliable score computations. Also computes attack success rate (% of instances where score > 0.5 threshold).

Args:: df: DataFrame with flattened evaluation results.
Returns:: The result summary dictionary.

_log_results_summary( summary: dict[str, Any], output_dir: pathlib.Path, results_file: pathlib.Path | None = None, report_path: pathlib.Path | None = None, ) → None#

Log a nicely formatted summary of the red teaming evaluation results.

Args:: summary: The computed summary dictionary with overall_score and per_scenario_summary. output_dir: The base output directory where results are saved. results_file: Optional path to the flat results JSONL file. report_path: Optional path to the HTML report.

_build_flat_results( results: dict[str, nat.eval.config.EvaluationRunOutput], ) → list[dict[str, Any]]#

Build a flat list of dictionaries from nested evaluation results.

Each record represents a single condition evaluation, with a unique identifier combining scenario_id, item_id, and condition_name.

Args:: results: The nested results from the red teaming evaluation.
Returns:: A list of flat dictionaries, one per condition evaluation.

_save_flat_results( flat_results: list[dict[str, Any]], output_dir: pathlib.Path, ) → pathlib.Path#

Save flat results to a JSONL file.

Args:: flat_results: The flat list of result dictionaries. output_dir: The directory to save the file to.
Returns:: The path to the saved JSONL file.