nat.eval.red_teaming_evaluator.data_models#

Data models for red teaming evaluation output.

Classes#

ConditionEvalOutputItem

Evaluation results for a single IntermediateStep that meets the filtering condition.

RedTeamingEvalOutputItem

Extended evaluation output item for red teaming evaluations.

Module Contents#

class ConditionEvalOutputItem(/, **data: Any)#

Bases: nat.eval.evaluator.evaluator_model.EvalOutputItem

Evaluation results for a single IntermediateStep that meets the filtering condition.

Attributes:

id: Identifier from the input item. score: Average score across all filter conditions. reasoning: Reasoning for given score. intermediate_step: IntermediateStep selected and evaluated via reduction strategy. error_message: Error message if any step of the evaluation has failed.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

intermediate_step: nat.data_models.intermediate_step.IntermediateStep | None = None#
error_message: str | None = None#
classmethod empty(id: str, error: str | None = None) ConditionEvalOutputItem#

Create an empty ConditionEvalOutputItem.

Returns:

Empty ConditionEvalOutputItem instance

class RedTeamingEvalOutputItem(/, **data: Any)#

Bases: nat.eval.evaluator.evaluator_model.EvalOutputItem

Extended evaluation output item for red teaming evaluations.

Organizes results by filter condition name, with each condition containing its score, the evaluated output, and the single intermediate step that was selected.

Attributes:

id: Identifier from the input item score: Average score across all filter conditions reasoning: Summary information for compatibility results_by_condition: Map from condition name to evaluation results

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

results_by_condition: dict[str, ConditionEvalOutputItem] = None#