CLI Reference#

The nemo-evaluator-byob CLI compiles, validates, lists, and containerizes BYOB benchmarks.

Commands#

Compile a Benchmark#

nemo-evaluator-byob my_benchmark.py

Compiles the benchmark definition and auto-installs the resulting package via pip. After compilation, the benchmark is immediately available to nemo-evaluator run_eval.

Validate Without Installing#

nemo-evaluator-byob my_benchmark.py --dry-run

Validates the benchmark definition and shows dataset info without installing the compiled package. Useful for checking your benchmark before committing changes.

Compile Without Auto-Install#

nemo-evaluator-byob my_benchmark.py --no-install

Compiles the benchmark but does not install it. You must manually add the output directory to your Python path:

export PYTHONPATH="~/.nemo-evaluator/byob_packages/byob_<name>:$PYTHONPATH"

List Installed Benchmarks#

nemo-evaluator-byob --list

Prints all currently installed BYOB benchmarks and their eval_type identifiers.

Containerize#

nemo-evaluator-byob my_benchmark.py --containerize

Builds a Docker image from the compiled benchmark. See Containerization for details on the image layout and deployment.

Containerize and Push#

nemo-evaluator-byob my_benchmark.py --push registry.example.com/my-bench:latest

Builds the Docker image and pushes it to the specified registry in one step. The --push flag implies --containerize automatically.

Additional Flags#

Flag

Description

--install-dir DIR

Custom installation directory

--base-image IMAGE

Base Docker image (default: python:3.12-slim)

--tag TAG

Docker image tag (default: byob_<name>:latest)

--check-requirements

Verify declared requirements are importable

--version

Show version

Running Evaluations#

After compiling a benchmark, run it with the standard nemo-evaluator CLI:

nemo-evaluator run_eval \
  --eval_type byob_<name>.<benchmark_name> \
  --model_url http://localhost:8000 \
  --model_id my-model \
  --model_type chat \
  --output_dir ./results \
  --api_key_name API_KEY

The --eval_type follows the pattern byob_<normalized_name>.<original_name>, where <normalized_name> is the lowercased, underscore-separated form of the name you passed to @benchmark(name=...).

Tip

Use nemo-evaluator-byob --list to see the exact eval_type for each installed benchmark. This avoids guessing the normalized name.

See Also#