nat.parameter_optimization.oracle_feedback#
Oracle feedback utilities for prompt optimization.
This module provides functions to extract, format, and inject failure reasoning from evaluation results into the prompt optimization genetic algorithm. The oracle feedback system enables context-grounded prompt evolution by learning from specific evaluation failures.
Functions#
|
Build truncated feedback string from worst items reasoning. |
|
Determine if oracle feedback should be injected for this mutation. |
|
Check if adaptive feedback should be triggered. |
|
Convert reasoning to a string, handling various types. |
|
Extract reasoning from worst-performing evaluation items. |
Module Contents#
- build_oracle_feedback( ) str | None#
Build truncated feedback string from worst items reasoning.
- Args:
reasoning_list: List of reasoning strings from worst-performing items. max_chars: Maximum characters for the output.
- Returns:
Formatted feedback string, or None if no reasoning available.
- should_inject_feedback( ) bool#
Determine if oracle feedback should be injected for this mutation.
- Args:
mode: Feedback mode (‘never’, ‘always’, ‘failing_only’, ‘adaptive’). scalar_fitness: The individual’s normalized fitness score. fitness_threshold: Threshold for ‘failing_only’ mode. adaptive_enabled: Whether adaptive feedback has been triggered.
- Returns:
True if feedback should be injected, False otherwise.
- check_adaptive_triggers(
- *,
- best_fitness_history: list[float],
- population_fitness_values: list[float],
- population_prompt_keys: list[tuple[Any, Ellipsis]],
- stagnation_generations: int,
- fitness_variance_threshold: float,
- diversity_threshold: float,
Check if adaptive feedback should be triggered.
- Args:
best_fitness_history: History of best fitness values per generation. population_fitness_values: Current population’s fitness values. population_prompt_keys: Hashable keys representing each individual’s prompts. stagnation_generations: Generations without improvement to trigger. fitness_variance_threshold: Variance threshold for collapse detection. diversity_threshold: Prompt duplication ratio threshold.
- Returns:
Dict with ‘triggered’ bool and ‘reason’ string if triggered.
- _reasoning_to_string(reasoning: Any) str#
Convert reasoning to a string, handling various types.
- Args:
reasoning: The reasoning value (str, dict, list, BaseModel, etc.)
- Returns:
String representation of the reasoning.
- extract_worst_reasoning(
- *,
- evaluation_results: list[tuple[str, Any]],
- weights_by_name: dict[str, float],
- directions_by_name: dict[str, str],
- worst_n: int,
Extract reasoning from worst-performing evaluation items.
- Args:
evaluation_results: List of (evaluator_name, EvalOutput) tuples. weights_by_name: Metric weights by evaluator name. directions_by_name: Optimization direction (‘maximize’ or ‘minimize’) by evaluator name. worst_n: Number of worst items to extract.
- Returns:
List of formatted reasoning strings with evaluator labels.