Installation#

Requirements#

  • Python 3.10+

  • An OpenAI-compatible model endpoint (NVIDIA API Catalog, vLLM, NIM, etc.)

Install from Source#

git clone https://github.com/NVIDIA-NeMo/Evaluator.git
cd Evaluator
pip install -e ".[scoring]"

Install Extras#

Extra

Command

What it adds

scoring

pip install -e ".[scoring]"

sympy for symbolic math comparison

stats

pip install -e ".[stats]"

scipy for confidence intervals, McNemar significance testing, and regression analysis

skills

pip install -e ".[skills]"

NeMo Skills benchmark integration

harbor

pip install -e ".[harbor]"

Harbor agent integration (OpenHands, Terminus-2, etc.)

~~proxy~~

Removed

Adapter proxy is now built-in (no extra install needed)

inspect

pip install -e ".[inspect]"

Inspect AI log export (inspect_ai-compatible EvalLog files)

ray

pip install -e ".[ray]"

Ray distributed launcher

harnesses

pip install -e ".[harnesses]"

lm-evaluation-harness tasks

export

pip install -e ".[export]"

WandB and MLflow experiment tracker export

docs

pip install -e ".[docs]"

Sphinx, NVIDIA theme, mermaid for building docs

all

pip install -e ".[all]"

Everything above

dev

pip install -e ".[dev]"

pytest, ruff, all extras

Verify Installation#

nel --version
nel list

Expected output:

nemo-evaluator 0.12.0

Available environments:
  drop, gpqa, gsm8k, healthbench, humaneval, math500,
  mgsm, mmlu, mmlu_pro, pinchbench, simpleqa,
  swebench-multilingual, swebench-verified, triviaqa, xstest

Docker#

docker build -t nemo-evaluator .
docker run nemo-evaluator nel list

Next Steps#

Proceed to the Quickstart to run your first evaluation.