Containerization#
Package BYOB benchmarks as Docker images for deployment with NeMo Evaluator Launcher. Containerization bundles the benchmark code, dataset, and all pip dependencies into a single reproducible image.
Building an Image#
The simplest way to build a container is with the --containerize flag:
nemo-evaluator-byob my_benchmark.py --containerize
Customize the base image and tag:
nemo-evaluator-byob my_benchmark.py \
--containerize \
--base-image python:3.12-slim \
--tag myrepo/byob_qa:latest
Image Layout#
The generated Docker image uses the following directory structure:
Path |
Contents |
|---|---|
|
Benchmark Python module(s) |
|
Dataset file(s) |
|
Compiled namespace package |
|
Framework metadata for harness discovery |
Note
The image entrypoint is pre-configured so that nemo-evaluator run_eval can
discover the benchmark without any additional path setup.
Push to Registry#
Build and push in a single command:
nemo-evaluator-byob my_benchmark.py --push registry.example.com/my-bench:latest
The --push flag implies --containerize automatically. Make sure you are
authenticated with the target registry before pushing (e.g., docker login).
Requirements#
Pip dependencies declared in the @benchmark(requirements=[...]) parameter are
installed inside the container during the build step. This ensures the runtime
environment matches the declared dependencies exactly.
@benchmark(
name="medmcqa",
dataset="hf://openlifescienceai/medmcqa?split=validation",
prompt="Q: {question}\nA:",
target_field="cop",
requirements=["datasets"],
)
Warning
If a requirement is missing from the requirements list, the container will
fail at runtime with an ImportError. Use --check-requirements during
compilation to verify that all declared packages are importable.
Running with Launcher#
Once the image is pushed to a registry, reference it in a NeMo Evaluator Launcher configuration file:
defaults:
- execution: local
- deployment: none
- _self_
evaluation:
tasks:
- name: byob_my_qa.my-qa
container: registry.example.com/my-bench:latest # or just local container name
The name field must match the eval_type produced by compilation. Use
nemo-evaluator-byob --list to confirm the exact value.
Tip
For multi-task evaluation, add multiple entries under tasks, each pointing to
a different containerized benchmark image.
See Also#
CLI Reference – CLI flags for building and pushing images
Bring Your Own Benchmark (BYOB) – BYOB overview and quickstart