Custom Evaluations#
Use the custom
evaluation type to create a custom evaluation job that uses your own datasets, prompt/task templates, and metrics.
Task Guides#
Perform common custom evaluation job tasks using the same operations as standard evaluation jobs.
Create and submit a new evaluation job with your target and configuration
View complete information about a specific evaluation job
Check the current status and progress of an evaluation job
View and filter all evaluation jobs in your namespace
Remove an evaluation job from the system
References#
Review the following articles to help you prepare datasets, templates, metrics, and tool calling for custom evaluation jobs.
How to prepare and format datasets for custom evaluation
How to use Jinja2 templates for prompts and tasks
Available metrics for evaluation, including string-check, BLEU, tool calling, and LLM-as-Judge
Understanding the output format and results