Output Format and Structure#

This section describes the output of custom evaluation jobs, including the structure of the output dataset and file details.

Output Dataset Structure#

The output of the evaluation task is a dataset that includes the model’s response and any computed metrics. The dataset is designed to facilitate the analysis of the model’s performance by matching inputs with outputs and providing relevant metrics.

The output dataset contains at least two mandatory columns:

id — A unique identifier used to match the output with the corresponding input from the dataset.
output_text — The output generated by the model.

ID Column#

If the input dataset contains an id column, it is used to match the input with the output.
If no id column is present in the input dataset, the row number is used as the ID.

For each computed metric at the sample level, a new column is added to the output dataset. The column name corresponds to the metric name, and the value is the computed metric value.

Output File#

The output file is named results.json and will be stored upon job completion as an evaluation result at the location that users can find in EvaluationJob.output_files_url.

For more information about the response format, refer to To Create an Evaluation Job.