External Environments#
All external integrations are EvalEnvironment subclasses resolved through the unified registry.
Environment Types#
Environment |
URI |
Source |
|---|---|---|
|
|
Remote HTTP server (Gym protocol) |
|
|
Auto-started Gym server (name auto-detected) |
|
|
NeMo Skills benchmarks |
|
|
lm-evaluation-harness tasks |
|
|
VLMEvalKit VLM benchmarks |
|
|
Legacy eval-factory containers |
nel eval run --bench gym://host:port
nel eval run --bench skills://gpqa
nel eval run --bench lm-eval://aime2025
nel eval run --bench vlmevalkit://MMBench_DEV_EN
See the dedicated tutorials for each:
Gym Integration – Gym environments
NeMo Skills Integration – NeMo Skills benchmarks
Writing a Custom Environment#
Implement EvalEnvironment directly:
from nemo_evaluator import EvalEnvironment, SeedResult, VerifyResult, register
@register("my_platform")
class MyPlatformEnvironment(EvalEnvironment):
def __init__(self, config: str = "default"):
super().__init__()
self._dataset = self._load(config)
async def seed(self, idx: int) -> SeedResult:
row = self._dataset[idx]
return SeedResult(prompt=row["prompt"], expected_answer=row["answer"])
async def verify(self, response: str, expected: str, **meta) -> VerifyResult:
correct = response.strip().lower() == expected.strip().lower()
return VerifyResult(reward=1.0 if correct else 0.0)
def _load(self, config):
...
Or use the @benchmark + @scorer API for simpler cases (see Write Your Own Benchmark (BYOB)).