Run and Manage Evaluation Jobs#
After you create an evaluation target and an evaluation configuration, you are ready to run an evaluation in NVIDIA NeMo Evaluator.
Evaluation Job Workflow#
Use the following procedure to run an evaluation in NeMo Evaluator.
Optionally create a custom dataset for evaluation and upload it to NeMo Data Store.
Prepare a new or existing target for your evaluation and record the ID.
Prepare a new or existing configuration for your evaluation and record the ID.
Create a job to run your evaluation that includes the IDs from the previous steps.
Tip
To see what targets and configurations are supported together, refer to Job Target and Configuration Matrix.
Task Guides#
Perform common evaluation tasks.
Create and submit a new evaluation job with your target and configuration
View complete information about a specific evaluation job
Check the current status and progress of an evaluation job
View and filter all evaluation jobs in your namespace
View evaluation results as JSON response
Download complete evaluation results as a ZIP file
Download job logs as a ZIP file
Remove an evaluation job from the system
References#
Standard Jobs#
Review detailed technical specifications and compatibility guides to help you configure and optimize your evaluation jobs effectively.
Learn which evaluation targets and configurations can be combined for different evaluation types
View expected evaluation times for different model, hardware, and dataset combinations
Reference for the JSON structure and fields used when creating evaluation jobs
Custom Jobs#
Review the following articles to help you prepare datasets, templates, metrics, and tool calling for custom evaluation jobs.
How to prepare and format datasets for custom evaluation
How to use Jinja2 templates for prompts and tasks
Available metrics for evaluation, including string-check, BLEU, tool calling, and LLM-as-Judge
Understanding the output format and results