Evaluation Techniques#

Follow step-by-step guides for different evaluation scenarios and methodologies in NeMo Evaluator.

Before You Start#

Ensure you have:

  1. Completed the initial getting started guides for Installation Guide and Quickstart.

  2. Have your endpoint and API key ready or prepared for the checkpoint you wish to deploy.

  3. Prepared your Hugging Face token for accessing gated datasets.

Evaluations#

Select an evaluation type tailored to your model capabilities.

Text Generation

Measure model performance through natural language generation for academic benchmarks, reasoning tasks, and general knowledge assessment.

Text Generation Evaluation
Log-Probability

Assess model confidence and uncertainty using log-probabilities for multiple-choice scenarios without text generation.

Evaluate LLMs Using Log-Probabilities