Bring Your Own Benchmark (BYOB)#

Create custom evaluation benchmarks in ~12 lines of Python using decorators, built-in scorers, and one-command containerization.

New to BYOB? See the quickstart below to create your first benchmark.

Prerequisites#

  • Python 3.10+

  • NeMo Evaluator installed (pip install nemo-evaluator)

  • An OpenAI-compatible model endpoint

Quickstart#

Step 1 – Write your benchmark#

Create a file called my_benchmark.py:

from nemo_evaluator.contrib.byob import benchmark, scorer, ScorerInput

@benchmark(
    name="my-qa",
    dataset="data.jsonl",
    prompt="Q: {question}\nA:",
    target_field="answer",
)
@scorer
def check(sample: ScorerInput) -> dict:
    return {"correct": sample.target.lower() in sample.response.lower()}

Step 2 – Compile#

nemo-evaluator-byob my_benchmark.py

Step 3 – Run#

nemo-evaluator run_eval \
  --eval_type byob_my_qa.my-qa \
  --model_url http://localhost:8000 \
  --model_id my-model \
  --model_type chat \
  --output_dir ./results \
  --api_key_name API_KEY

Tip

Use nemo-evaluator-byob my_benchmark.py --dry-run to validate your benchmark without installing it.

Reference Documentation#

Benchmark Decorator

Define benchmarks with the @benchmark decorator.

Benchmark Decorator
Scorers

Built-in scorers and custom scoring functions.

Scorers
LLM-as-Judge

Judge-based evaluation with LLM models.

LLM-as-Judge
Datasets

Dataset formats, HuggingFace URIs, and field mapping.

Datasets
CLI Reference

Compile, validate, list, and containerize benchmarks.

CLI Reference
Containerization

Package benchmarks as Docker images.

Containerization

Examples#

Complete annotated examples are available in the source repository under packages/nemo-evaluator/examples/byob/:

  • MedMCQA – HuggingFace dataset with field mapping and custom letter-extraction scorer

  • Global MMLU Lite – Multilingual MMLU with per-category scoring breakdowns

  • TruthfulQA – LLM-as-Judge with custom template and **template_kwargs

  • Math Reasoning – Numeric extraction with tolerance comparison