Custom Evaluations#

Use the custom evaluation type to create a custom evaluation job that uses your own datasets, prompt/task templates, and metrics.

Task Guides#

Perform common custom evaluation job tasks using the same operations as standard evaluation jobs.

Create Job

Create and submit a new evaluation job with your target and configuration

Get Job Details

View complete information about a specific evaluation job

Get Job Status

Check the current status and progress of an evaluation job

List Jobs

View and filter all evaluation jobs in your namespace

Delete Job

Remove an evaluation job from the system

Review the following articles to help you prepare datasets, templates, metrics, and tool calling for custom evaluation jobs.

Data Format

How to prepare and format datasets for custom evaluation

Templating Syntax

How to use Jinja2 templates for prompts and tasks

Available Metrics

Available metrics for evaluation, including string-check, BLEU, tool calling, and LLM-as-Judge

Output

Understanding the output format and results