nat.eval.dataset_handler.dataset_handler#

Classes#

DatasetHandler

Read the datasets and pre-process (apply filters, deduplicate etc.) before turning them into EvalInput objects.

Module Contents#

class DatasetHandler(
dataset_config: nat.data_models.dataset_handler.EvalDatasetConfig,
reps: int,
concurrency: int,
num_passes: int = 1,
adjust_dataset_size: bool = False,
)#

Read the datasets and pre-process (apply filters, deduplicate etc.) before turning them into EvalInput objects. One DatasetHandler object is needed for each dataset to be evaluated.

dataset_config#
dataset_filter#
reps#
concurrency#
num_passes = 1#
adjust_dataset_size = False#
intermediate_step_adapter#
is_structured_input() bool#

Check if the input is structured or unstructured

property id_key: str#
property question_key: str#
property answer_key: str#
property generated_answer_key: str#
property trajectory_key: str#
property expected_trajectory_key: str#
get_eval_input_from_df(
input_df: pandas.DataFrame,
) nat.eval.evaluator.evaluator_model.EvalInput#
setup_reps(input_df: pandas.DataFrame) pandas.DataFrame#

replicate the rows and update the id to id_key + “_rep” + rep_number

adjust_dataset(input_df: pandas.DataFrame) pandas.DataFrame#

Adjust the dataset so its length is a multiple of concurrency.

If num_passes > 0:

dataset size is adjusted to concurrency * num_passes

else:

dataset size is adjusted to the largest multiple of concurrency that is less than or equal to the current dataset size

get_eval_input_from_dataset(
dataset: str,
) nat.eval.evaluator.evaluator_model.EvalInput#
_preprocess_dataframe(input_df: pandas.DataFrame) pandas.DataFrame#

Apply standard preprocessing to a DataFrame: filters, deduplication, repetitions, and size adjustment.

Args:

input_df: DataFrame to preprocess

Returns:

Preprocessed DataFrame

_preprocess_eval_dataframe(
input_df: pandas.DataFrame,
) nat.eval.evaluator.evaluator_model.EvalInput#

Apply standard preprocessing to a DataFrame and convert to EvalInput.

Args:

input_df: DataFrame to preprocess

Returns:

Preprocessed EvalInput object

_preprocess_eval_input(
eval_input: nat.eval.evaluator.evaluator_model.EvalInput,
) nat.eval.evaluator.evaluator_model.EvalInput#

Apply standard preprocessing to an EvalInput object.

Thin wrapper that converts EvalInput to DataFrame, processes it, and converts back.

Args:

eval_input: EvalInput object to preprocess

Returns:

Preprocessed EvalInput object

_handle_custom_dataset(
dataset: str | None,
) nat.eval.evaluator.evaluator_model.EvalInput#

Handle custom dataset type by calling the user-defined function and applying standard preprocessing to the result.

Args:

dataset: Optional dataset file path from command line

Returns:

Preprocessed EvalInput object

_eval_input_to_dataframe(
eval_input: nat.eval.evaluator.evaluator_model.EvalInput,
) pandas.DataFrame#

Convert an EvalInput object to a pandas DataFrame for processing.

Args:

eval_input: EvalInput object to convert

Returns:

DataFrame representation of the EvalInput

filter_intermediate_steps(
intermediate_steps: list[nat.data_models.intermediate_step.IntermediateStep],
event_filter: list[nat.data_models.intermediate_step.IntermediateStepType] | None = None,
) list[dict]#

Filter out the intermediate steps that are not relevant for evaluation. The output is written with with the intention of re-running the evaluation using the original config file.

publish_eval_input(
eval_input,
workflow_output_step_filter: list[nat.data_models.intermediate_step.IntermediateStepType] | None = None,
) str#

Convert the EvalInput object to a JSON output for storing in a file. Use the orginal keys to allow re-running evaluation using the orignal config file and ‘–skip_workflow’ option.