Custom Evaluations#

Use the custom evaluation type to create a custom evaluation job that uses your own datasets, prompt/task templates, and metrics.


Task Guides#

Perform common custom evaluation job tasks using the same operations as standard evaluation jobs.

Create Job

Create and submit a new evaluation job with your target and configuration

Create Evaluation Job
Get Job Details

View complete information about a specific evaluation job

Get Evaluation Job Details
Get Job Status

Check the current status and progress of an evaluation job

Get Evaluation Job Status
List Jobs

View and filter all evaluation jobs in your namespace

List Evaluation Jobs
Delete Job

Remove an evaluation job from the system

Delete Evaluation Job

References#

Review the following articles to help you prepare datasets, templates, metrics, and tool calling for custom evaluation jobs.

Data Format

How to prepare and format datasets for custom evaluation

Data for Custom Evaluations
Templating Syntax

How to use Jinja2 templates for prompts and tasks

Templating for Tasks
Available Metrics

Available metrics for evaluation, including string-check, BLEU, tool calling, and LLM-as-Judge

Metrics for Custom Evaluation
Output

Understanding the output format and results

Output Format and Structure