Evaluation Tutorials#

Use these tutorials to become familiar with NVIDIA NeMo Evaluator.

Tip

Tutorials are organized by complexity and typically build on one another.

Before You Start#

Set up Evaluator with Docker Compose and deploy meta/llama-3.2-3b-instruct for the following tutorials.

Run an Academic LM Harness Eval

Learn how to run an evaluation.

Run an Academic LM Harness Eval
Run an LLM Judge Eval

Learn how to evaluate a fine-tuned model using the LLM Judge metric with a custom dataset.

Run an LLM Judge Eval

The following tutorial requires Evaluator deployed following the Demo Cluster Setup on minikube or Kubernetes deployment guides for the platform.

Evaluate a Fine-tuned Model

Learn how to evaluate a fine-tuned model.

Customize and Evaluate Large Language Models