Run and Manage Evaluation Jobs#

After you create an evaluation target and an evaluation configuration, you are ready to run an evaluation in NVIDIA NeMo Evaluator.

Evaluation Job Workflow#

Use the following procedure to run an evaluation in NeMo Evaluator.

  1. Optionally create a custom dataset for evaluation and upload it to NeMo Data Store.

  2. Prepare a new or existing target for your evaluation and record the ID.

  3. Prepare a new or existing configuration for your evaluation and record the ID.

  4. Create a job to run your evaluation that includes the IDs from the previous steps.

  5. Get your results.

Tip

To see what targets and configurations are supported together, refer to Job Target and Configuration Matrix.


Task Guides#

Perform common evaluation tasks.

Create Job

Create and submit a new evaluation job with your target and configuration

Create Evaluation Job
Get Job Details

View complete information about a specific evaluation job

Get Evaluation Job Details
Get Job Status

Check the current status and progress of an evaluation job

Get Evaluation Job Status
List Jobs

View and filter all evaluation jobs in your namespace

List Evaluation Jobs
Get Job Results

View evaluation results as JSON response

Get Evaluation Results
Download Detailed Results

Download complete evaluation results as a ZIP file

Download Evaluation Results
Get Job Logs

Download job logs as a ZIP file

Download Evaluation Job Logs
Delete Job

Remove an evaluation job from the system

Delete Evaluation Job

References#

Standard Jobs#

Review detailed technical specifications and compatibility guides to help you configure and optimize your evaluation jobs effectively.

Job Target and Config Matrix

Learn which evaluation targets and configurations can be combined for different evaluation types

Job Target and Configuration Matrix
Job Durations

View expected evaluation times for different model, hardware, and dataset combinations

Expected Evaluation Duration
Job Schema

Reference for the JSON structure and fields used when creating evaluation jobs

Job JSON Schema Reference

Custom Jobs#

Review the following articles to help you prepare datasets, templates, metrics, and tool calling for custom evaluation jobs.

Data Format

How to prepare and format datasets for custom evaluation

Data for Custom Evaluations
Templating Syntax

How to use Jinja2 templates for prompts and tasks

Templating for Tasks
Available Metrics

Available metrics for evaluation, including string-check, BLEU, tool calling, and LLM-as-Judge

Metrics for Custom Evaluation
Output

Understanding the output format and results

Output Format and Structure