nat.eval.runners.red_teaming_runner#
Submodules#
Classes#
Top-level configuration for red teaming evaluation. |
|
A single red teaming scenario configuration. |
|
Runner for executing red teaming evaluations across multiple scenarios. |
Package Contents#
- class RedTeamingRunnerConfig(/, **data: Any)#
Bases:
pydantic.BaseModelTop-level configuration for red teaming evaluation.
Supports
evaluator_defaultsfor DRY configuration with_extendsinheritance.- Attributes:
base_workflow: Optional path to the base workflow configuration file. llms: Dictionary of LLM configurations keyed by name. evaluator_defaults: Named evaluator configs that scenarios can extend. general: General evaluation settings (concurrency, output, dataset). scenarios: Dictionary of scenario configurations.
Example YAML configuration:
base_workflow: ./configs/base_workflow.yml llms: judge_llm: _type: nim model_name: meta/llama-3.1-70b-instruct evaluator_defaults: standard_eval: llm_name: judge_llm _type: red_teaming_evaluator judge_llm_prompt: > Evaluate whether the attack was successful... filter_conditions: - name: workflow_output event_type: FUNCTION_END payload_name: <workflow> reduction_strategy: last general: max_concurrency: 4 output_dir: ./.tmp/nat/redteaming/ scenarios: intercept_payload_42: middleware: _type: red_teaming target_function_or_group: my_calculator attack_payload: "42.0" evaluator: _extends: standard_eval scenario_specific_instructions: "Check for 42.0..." custom_scenario: tags: [category_1, category_2] middleware: {} evaluator: llm_name: judge_llm _type: red_teaming_evaluator judge_llm_prompt: "Custom prompt..." filter_conditions: []
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- base_workflow: pathlib.Path | None = None#
- llms: dict[str, nat.data_models.llm.LLMBaseConfig] = None#
- evaluator_defaults: dict[str, nat.eval.red_teaming_evaluator.register.RedTeamingEvaluatorConfig] | None = None#
- general: nat.data_models.evaluate.EvalGeneralConfig | None = None#
- scenarios: dict[str, RedTeamingScenario | _RedTeamingScenarioRaw] = None#
- validate_and_resolve_scenarios() RedTeamingRunnerConfig#
Validate scenarios and resolve _extends inheritance.
This runs after Pydantic parsing, so evaluator_defaults are already validated RedTeamingEvaluatorConfig objects. We convert any _RedTeamingScenarioRaw to RedTeamingScenario by resolving _extends.
- Returns:
The validated configuration with all scenarios as RedTeamingScenario
- classmethod rebuild_annotations() bool#
Rebuild field annotations with discriminated unions.
This method updates the llms dict value annotation to use a discriminated union of all registered LLM providers. This allows Pydantic to correctly deserialize the _type field into the appropriate concrete LLM config class.
- Returns:
True if the model was rebuilt, False otherwise.
- class RedTeamingScenario(/, **data: Any)#
Bases:
pydantic.BaseModelA single red teaming scenario configuration.
Each scenario defines a complete middleware and evaluator configuration. The evaluator can use _extends to inherit from evaluator_defaults.
- Attributes:
- scenario_id: Optional unique identifier. If not provided, the dict key
from
RedTeamingRunnerConfig.scenariosis used.- middleware: Full middleware configuration to apply. Set to None for
baseline scenarios (no middleware modification).
- evaluator: Complete evaluator configuration. Can inherit from
evaluator_defaultsusing_extendsin YAML/JSON.
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- middleware: nat.middleware.red_teaming.red_teaming_middleware_config.RedTeamingMiddlewareConfig | None = None#
- evaluator: nat.eval.red_teaming_evaluator.register.RedTeamingEvaluatorConfig = None#
- class RedTeamingRunner(
- config: nat.eval.runners.red_teaming_runner.config.RedTeamingRunnerConfig | None,
- base_workflow_config: nat.data_models.config.Config,
- dataset_path: str | None = None,
- result_json_path: str = '$',
- endpoint: str | None = None,
- endpoint_timeout: int = 300,
- reps: int = 1,
- overrides: tuple[tuple[str, str], Ellipsis] = (),
Runner for executing red teaming evaluations across multiple scenarios.
This runner encapsulates all the logic for:
Generating workflow configurations for each scenario
Setting up output directories
Saving configuration files
Running evaluations via MultiEvaluationRunner
Example usage:
runner = RedTeamingRunner( config=rt_config, base_workflow_config=base_workflow_config, dataset_path="/path/to/dataset.json", ) results = await runner.run()
Initialize the RedTeamingRunner.
- Args:
config: Red teaming config with scenarios (None uses base_workflow_config). base_workflow_config: Base workflow config to transform for each scenario. dataset_path: Optional dataset path (overrides config dataset). result_json_path: JSON path to extract the result from the workflow. endpoint: Optional endpoint URL for running the workflow. endpoint_timeout: HTTP response timeout in seconds. reps: Number of repetitions for the evaluation. overrides: Config overrides using dot notation (path, value) tuples.
- config#
- base_workflow_config#
- dataset_path = None#
- result_json_path = '$'#
- endpoint = None#
- endpoint_timeout = 300#
- reps = 1#
- overrides = ()#
- _generated_workflow_configs: dict[str, nat.data_models.config.Config] | None = None#
- _base_output_dir: pathlib.Path | None = None#
- async run() dict[str, nat.eval.config.EvaluationRunOutput]#
Run the red teaming evaluation across all scenarios.
- Returns:
Dictionary mapping scenario_id to EvaluationRunOutput.
- Raises:
ValueError: If configuration validation fails.
- generate_workflow_configs() dict[str, nat.data_models.config.Config]#
Generate workflow configurations for each scenario.
If config is None, returns the base_workflow_config as a single scenario after validating it has the required red teaming components.
- Returns:
Dictionary mapping scenario_id to the transformed Config.
- Raises:
ValueError: If validation fails.
- setup_output_directory(
- generated_workflow_configs: dict[str, nat.data_models.config.Config],
Set up the base output directory.
If the directory already exists, creates a new directory with a timestamp and unique identifier suffix.
- Args:
generated_workflow_configs: The generated workflow configs per scenario.
- Returns:
The base output directory path.
- save_configs(
- base_output_dir: pathlib.Path,
- generated_workflow_configs: dict[str, nat.data_models.config.Config],
Save base workflow config, red team config, and scenario workflow configs to disk.
- Args:
base_output_dir: The base output directory. generated_workflow_configs: The generated workflow configs per scenario.
- _apply_overrides_to_all(
- generated_workflow_configs: dict[str, nat.data_models.config.Config],
Apply CLI overrides to all scenario configs.
- Args:
scenario_configs: The scenario configurations to modify.
- Returns:
The modified scenario configurations.
- _build_evaluation_configs(
- base_output_dir: pathlib.Path,
- scenario_configs: dict[str, nat.data_models.config.Config],
Build EvaluationRunConfig for each scenario.
- Args:
base_output_dir: The base output directory. scenario_configs: The generated scenario configurations.
- Returns:
Dictionary mapping scenario_id to EvaluationRunConfig.
- Raises:
ValueError: If config validation fails.
- _validate_base_config_for_direct_use(
- base_workflow_config: nat.data_models.config.Config,
Validate that a workflow config is compatible with red teaming.
A workflow config is compatible if it contains: - At least one RedTeamingMiddleware (or subclass) - At least one red_teaming_evaluator
This is used when the user provides a pre-configured workflow instead of a RedTeamingRunnerConfig.
- Args:
base_workflow_config: The workflow configuration to validate.
- Raises:
ValueError: If the config is not red-team compatible.
- _warn_about_other_evaluators(
- base_workflow_config: nat.data_models.config.Config,
Warn if the base workflow config contains other evaluators.
Red teaming evaluation is potentially incompatible with other evaluators due to its adversarial nature.
- Args:
base_workflow_config: The base workflow configuration to validate.
- _validate_dataset_exists(
- base_workflow_config: nat.data_models.config.Config,
- dataset_path: str | None,
Validate that a dataset is defined somewhere.
Dataset can be defined in: - CLI –dataset argument (dataset_path) - RedTeamingRunnerConfig.general.dataset - base_workflow_config.eval.general.dataset
- Args:
base_workflow_config: The base workflow configuration. dataset_path: Optional dataset path from CLI.
- Raises:
ValueError: If no dataset is defined anywhere.
- _merge_general_config(
- base_workflow_config_dict: dict[str, Any],
- general: nat.data_models.evaluate.EvalGeneralConfig,
Merge general eval settings into the base workflow config dict.
This performs a union of the base workflow’s eval.general with the RedTeamingRunnerConfig.general, where RedTeamingRunnerConfig values take precedence. Only explicitly set values override base values.
- Args:
base_workflow_config_dict: The configuration dictionary to modify (in place). general: The EvalGeneralConfig from RedTeamingRunnerConfig.
- _attach_middleware_everywhere( ) None#
Attach middleware to all functions, function_groups, and workflow.
The middleware’s internal target_function_or_group handles runtime activation - this just ensures the middleware is registered everywhere.
- Args:
base_workflow_config_dict: The configuration dictionary to modify (in place). middleware_name: Name of the middleware to attach.
- _inject_evaluator_config(
- base_workflow_config_dict: dict[str, Any],
- scenario: nat.eval.runners.red_teaming_runner.config.RedTeamingScenario,
Inject the evaluator configuration into the workflow config.
Creates a red_teaming_evaluator in the eval section using the complete evaluator configuration from the scenario.
- Args:
base_workflow_config_dict: The configuration dictionary to modify (in place). scenario: The scenario containing the complete evaluator config.
- _update_config_value( ) None#
Update a single value in the scenario config dictionary at the specified path.
- Args:
scenario_config_dict: The scenario configuration dictionary to update. path: The path to the value to update. value: The new value to set at the specified path.
- _find_red_teaming_evaluator_results(
- results: dict[str, nat.eval.config.EvaluationRunOutput],
Find the red teaming evaluator results in the results.
- Args:
results: The results of the red teaming evaluation.
- Returns:
The red teaming evaluator results.
- _compute_result_summary(df: pandas.DataFrame) dict[str, Any]#
Compute the result summary for the red teaming evaluation using pandas.
Filters out rows with errors (error_message is not None) for reliable score computations. Also computes attack success rate (% of instances where score > 0.5 threshold).
- Args:
df: DataFrame with flattened evaluation results.
- Returns:
The result summary dictionary.
- _log_results_summary(
- summary: dict[str, Any],
- output_dir: pathlib.Path,
- results_file: pathlib.Path | None = None,
- report_path: pathlib.Path | None = None,
Log a nicely formatted summary of the red teaming evaluation results.
- Args:
summary: The computed summary dictionary with overall_score and per_scenario_summary. output_dir: The base output directory where results are saved. results_file: Optional path to the flat results JSONL file. report_path: Optional path to the HTML report.
- _build_flat_results(
- results: dict[str, nat.eval.config.EvaluationRunOutput],
Build a flat list of dictionaries from nested evaluation results.
Each record represents a single condition evaluation, with a unique identifier combining scenario_id, item_id, and condition_name.
- Args:
results: The nested results from the red teaming evaluation.
- Returns:
A list of flat dictionaries, one per condition evaluation.
- _save_flat_results(
- flat_results: list[dict[str, Any]],
- output_dir: pathlib.Path,
Save flat results to a JSONL file.
- Args:
flat_results: The flat list of result dictionaries. output_dir: The directory to save the file to.
- Returns:
The path to the saved JSONL file.