CLI Reference#
The nemo-evaluator-byob CLI compiles, validates, lists, and containerizes BYOB benchmarks.
Commands#
Compile a Benchmark#
nemo-evaluator-byob my_benchmark.py
Compiles the benchmark definition and auto-installs the resulting package via pip.
After compilation, the benchmark is immediately available to nemo-evaluator run_eval.
Validate Without Installing#
nemo-evaluator-byob my_benchmark.py --dry-run
Validates the benchmark definition and shows dataset info without installing the compiled package. Useful for checking your benchmark before committing changes.
Compile Without Auto-Install#
nemo-evaluator-byob my_benchmark.py --no-install
Compiles the benchmark but does not install it. You must manually add the output directory to your Python path:
export PYTHONPATH="~/.nemo-evaluator/byob_packages/byob_<name>:$PYTHONPATH"
List Installed Benchmarks#
nemo-evaluator-byob --list
Prints all currently installed BYOB benchmarks and their eval_type identifiers.
Containerize#
nemo-evaluator-byob my_benchmark.py --containerize
Builds a Docker image from the compiled benchmark. See Containerization for details on the image layout and deployment.
Containerize and Push#
nemo-evaluator-byob my_benchmark.py --push registry.example.com/my-bench:latest
Builds the Docker image and pushes it to the specified registry in one step.
The --push flag implies --containerize automatically.
Additional Flags#
Flag |
Description |
|---|---|
|
Custom installation directory |
|
Base Docker image (default: |
|
Docker image tag (default: |
|
Verify declared requirements are importable |
|
Show version |
Running Evaluations#
After compiling a benchmark, run it with the standard nemo-evaluator CLI:
nemo-evaluator run_eval \
--eval_type byob_<name>.<benchmark_name> \
--model_url http://localhost:8000 \
--model_id my-model \
--model_type chat \
--output_dir ./results \
--api_key_name API_KEY
The --eval_type follows the pattern byob_<normalized_name>.<original_name>,
where <normalized_name> is the lowercased, underscore-separated form of the
name you passed to @benchmark(name=...).
Tip
Use nemo-evaluator-byob --list to see the exact eval_type for each installed
benchmark. This avoids guessing the normalized name.
See Also#
Bring Your Own Benchmark (BYOB) – BYOB overview and quickstart
Containerization – Packaging benchmarks as Docker images