Run and Manage Evaluation Jobs#

After you create an evaluation target and an evaluation configuration, you are ready to run an evaluation in NVIDIA NeMo Evaluator.

Warning

API Versions: The Evaluator API is available in both v1 (current) and v2 (preview). v2 introduces new features like consolidated status information, real-time log access, and enhanced job structure, but includes breaking changes. For production workloads, continue using v1 until v2 is fully supported. Refer to the v2 Migration Guide for upgrade guidance.

Evaluation Job Workflow#

Use the following procedure to run an evaluation in NeMo Evaluator.

  1. Optionally create a custom dataset for evaluation and upload it to NeMo Data Store.

  2. Prepare a new or existing target for your evaluation and record the ID.

  3. Prepare a new or existing configuration for your evaluation and record the ID.

  4. Create a job to run your evaluation that includes the IDs from the previous steps.

  5. Get your results.

Tip

To see what targets and configurations are supported together, refer to Job Target and Configuration Matrix.


Task Guides#

Perform common evaluation tasks.

Tip

The tutorials reference an EVALUATOR_BASE_URL whose value will depend on the ingress in your particular cluster. If you are using the minikube demo installation, it will be http://nemo.test. The demo installation’s for NIM_PROXY_BASE_URL is http://nim.test. Otherwise, you will need to consult with your own cluster administrator for the ingress values.

Create Job

Create and submit a new evaluation job with your target and configuration

Create Evaluation Job
Get Job Details

View complete information about a specific evaluation job

Get Evaluation Job Details
Get Job Status

Check the current status and progress of an evaluation job

Get Evaluation Job Status
List Jobs

View and filter all evaluation jobs in your namespace

List Evaluation Jobs
Get Job Results

View evaluation results as JSON response

Get Evaluation Results
Download Detailed Results

Download complete evaluation results as a ZIP file

Download Evaluation Results
Get Job Logs

Download job logs as a ZIP file

Download Evaluation Job Logs
Delete Job

Remove an evaluation job from the system

Delete Evaluation Job
v2 Migration Guide

Migrate from v1 to v2 API with breaking changes guide

V2 API Migration Guide

References#

Standard Jobs#

Review detailed technical specifications and compatibility guides to help you configure and optimize your evaluation jobs effectively.

Job Target and Config Matrix

Learn which evaluation targets and configurations can be combined for different evaluation types

Job Target and Configuration Matrix
Job Durations

View expected evaluation times for different model, hardware, and dataset combinations

Expected Evaluation Duration
Job Schema

Reference for the JSON structure and fields used when creating evaluation jobs

Job JSON Schema Reference