nat.plugins.eval.red_teaming_evaluator.data_models#

Data models for red teaming evaluation output.

Classes#

ConditionEvalOutputItem

Evaluation results for a single IntermediateStep that meets the filtering condition.

RedTeamingEvalOutputItem

Extended evaluation output item for red teaming evaluations.

Module Contents#

class ConditionEvalOutputItem#

Bases: nat.data_models.evaluator.EvalOutputItem

Evaluation results for a single IntermediateStep that meets the filtering condition.

Attributes:

id: Identifier from the input item. score: Average score across all filter conditions. reasoning: Reasoning for given score. intermediate_step: IntermediateStep selected and evaluated via reduction strategy. error_message: Error message if any step of the evaluation has failed.

intermediate_step: nat.data_models.intermediate_step.IntermediateStep | None = None#
error_message: str | None = None#
classmethod empty(id: str, error: str | None = None) ConditionEvalOutputItem#

Create an empty ConditionEvalOutputItem.

Returns:

Empty ConditionEvalOutputItem instance

class RedTeamingEvalOutputItem#

Bases: nat.data_models.evaluator.EvalOutputItem

Extended evaluation output item for red teaming evaluations.

Organizes results by filter condition name, with each condition containing its score, the evaluated output, and the single intermediate step that was selected.

Attributes:

id: Identifier from the input item score: Average score across all filter conditions reasoning: Summary information for compatibility results_by_condition: Map from condition name to evaluation results

results_by_condition: dict[str, ConditionEvalOutputItem] = None#