aiq.eval.dataset_handler.dataset_handler#

Classes#

DatasetHandler

Read the datasets and pre-process (apply filters, deduplicate etc.) before turning them into EvalInput objects.

Module Contents#

class DatasetHandler(
dataset_config: aiq.data_models.dataset_handler.EvalDatasetConfig,
reps: int,
)#

Read the datasets and pre-process (apply filters, deduplicate etc.) before turning them into EvalInput objects. One DatasetHandler object is needed for each dataset to be evaluated.

dataset_config#
dataset_filter#
reps#
intermediate_step_adapter#
is_structured_input() bool#

Check if the input is structured or unstructured

property id_key: str#
property question_key: str#
property answer_key: str#
property generated_answer_key: str#
property trajectory_key: str#
property expected_trajectory_key: str#
get_eval_input_from_df(
input_df: pandas.DataFrame,
) aiq.eval.evaluator.evaluator_model.EvalInput#
setup_reps(input_df: pandas.DataFrame) pandas.DataFrame#

replicate the rows and update the id to id_key + “_rep” + rep_number

get_eval_input_from_dataset(
dataset: str,
) aiq.eval.evaluator.evaluator_model.EvalInput#
filter_intermediate_steps(
intermediate_steps: list[aiq.data_models.intermediate_step.IntermediateStep],
) list[dict]#

Filter out the intermediate steps that are not relevant for evaluation. The output is written with with the intention of re-running the evaluation using the original config file.

publish_eval_input(eval_input) str#

Convert the EvalInput object to a JSON output for storing in a file. Use the orginal keys to allow re-running evaluation using the orignal config file and ‘–skip_workflow’ option.