Run and Manage Evaluation Jobs#
After you create an evaluation target and an evaluation configuration, you are ready to run an evaluation in NVIDIA NeMo Evaluator.
Warning
API Versions: The Evaluator API is available in both v1 (current) and v2 (preview). v2 introduces new features like consolidated status information, real-time log access, and enhanced job structure, but includes breaking changes. For production workloads, continue using v1 until v2 is fully supported. Refer to the v2 Migration Guide for upgrade guidance.
Evaluation Job Workflow#
Use the following procedure to run an evaluation in NeMo Evaluator.
Optionally create a custom dataset for evaluation and upload it to NeMo Data Store.
Prepare a new or existing target for your evaluation and record the ID.
Prepare a new or existing configuration for your evaluation and record the ID.
Create a job to run your evaluation that includes the IDs from the previous steps.
Tip
To see what targets and configurations are supported together, refer to Job Target and Configuration Matrix.
Task Guides#
Perform common evaluation tasks.
Tip
The tutorials reference an EVALUATOR_BASE_URL whose value will depend on the ingress in your particular cluster. If you are using the minikube demo installation, it will be http://nemo.test. The demo installation’s for NIM_PROXY_BASE_URL is http://nim.test. Otherwise, you will need to consult with your own cluster administrator for the ingress values.
Create and submit a new evaluation job with your target and configuration
View complete information about a specific evaluation job
Check the current status and progress of an evaluation job
View and filter all evaluation jobs in your namespace
View evaluation results as JSON response
Download complete evaluation results as a ZIP file
Download job logs as a ZIP file
Remove an evaluation job from the system
Migrate from v1 to v2 API with breaking changes guide
References#
Standard Jobs#
Review detailed technical specifications and compatibility guides to help you configure and optimize your evaluation jobs effectively.
Learn which evaluation targets and configurations can be combined for different evaluation types
View expected evaluation times for different model, hardware, and dataset combinations
Reference for the JSON structure and fields used when creating evaluation jobs