About Evaluating#

NVIDIA NeMo Evaluator supports evaluation of LLMs through academic benchmarks, custom automated evaluations, and LLM-as-a-Judge. Beyond LLM evaluation, NeMo Evaluator also supports evaluation of Retriever and RAG pipelines.

Typical NeMo Evaluator Workflow#

A typical NeMo Evaluator workflow looks like the following:

Note

NeMo Evaluator depends on NVIDIA NIM for LLMs and NeMo Data Store.

  1. (Optional) If you are using a custom dataset for evaluation, upload it to NeMo Data Store before you run an evaluation.

  2. Create an evaluation target in NeMo Evaluator.

  3. Create an evaluation configuration in NeMo Evaluator.

  4. Run an evaluation job by submitting a request to NeMo Evaluator.

    1. NeMo Evaluator downloads custom data, if any, from NeMo Data Store.

    2. NeMo Evaluator runs inference with NIM for LLMs, Embeddings, and Reranking, depending on the model being evaluated.

    3. NeMo Evaluator writes the results, including generations, logs, and metrics to NeMo Data Store.

    4. NeMo Evaluator returns the results.

  5. Get your results.

For more information, see Run and Manage Evaluation Jobs.


Task Guides#

The following guides provide detailed information on how to perform common Nemo Evaluator tasks.

Targets

Create targets for evaluations.

Create and Manage Evaluation Targets
Configurations

Create configurations for evaluations.

Create and Manage Evaluation Configurations
Jobs

Create and run evaluation jobs.

Run and Manage Evaluation Jobs
Results

Get the results of your evaluation jobs.

Use the Results of Your Job

Tutorials#

The following tutorials provide step-by-step instructions to complete specific evaluation goals.

Run a Simple Evaluation

Learn how to run an evaluation.

Run a Simple Evaluation
Evaluate a Fine-tuned Model

Learn how to evaluate a fine-tuned model.

Customize the Evaluation Loop

Reference#

The following documentation provides detailed information about the Evaluator API.

API Reference

View the NeMo Evaluator API reference.

Evaluator API
Troubleshooting

Troubleshoot issues that arise when you work with NeMo Evaluator.

Troubleshooting NeMo Evaluator