Installation#
Requirements#
Python 3.10+
An OpenAI-compatible model endpoint (NVIDIA API Catalog, vLLM, NIM, etc.)
Install from Source#
git clone https://github.com/NVIDIA-NeMo/Evaluator.git
cd Evaluator
pip install -e ".[scoring]"
Install Extras#
Extra |
Command |
What it adds |
|---|---|---|
|
|
sympy for symbolic math comparison |
|
|
scipy for confidence intervals, McNemar significance testing, and regression analysis |
|
|
NeMo Skills benchmark integration |
|
|
Harbor agent integration (OpenHands, Terminus-2, etc.) |
~~ |
Removed |
Adapter proxy is now built-in (no extra install needed) |
|
|
Inspect AI log export ( |
|
|
Ray distributed launcher |
|
|
lm-evaluation-harness tasks |
|
|
WandB and MLflow experiment tracker export |
|
|
Sphinx, NVIDIA theme, mermaid for building docs |
|
|
Everything above |
|
|
pytest, ruff, all extras |
Verify Installation#
nel --version
nel list
Expected output:
nemo-evaluator 0.12.0
Available environments:
drop, gpqa, gsm8k, healthbench, humaneval, math500,
mgsm, mmlu, mmlu_pro, pinchbench, simpleqa,
swebench-multilingual, swebench-verified, triviaqa, xstest
Docker#
docker build -t nemo-evaluator .
docker run nemo-evaluator nel list
Next Steps#
Proceed to the Quickstart to run your first evaluation.