nemo_gym.cli.eval
nemo_gym.cli.eval
Module Contents
Classes
Functions
API
Bases: BaseNeMoGymCLIConfig
Prepare benchmark data by running the benchmark’s prepare.py script.
The benchmark is identified from a config_paths entry pointing to a benchmarks/*/config.yaml file.
Examples:
Resolve a benchmark’s config to its domain (for the domain column and gym search).
BenchmarkConfig flattens away the domain, so we re-resolve the config with the same parser
BenchmarkConfig uses (so chained config_paths / _inherit_from are applied) and read the field
back out. domain may be declared on any server config — a resources server (e.g. aime24) or an
agent (e.g. tau2) — so we scan every server group.
Whether query fuzzily matches any of fields: a substring or a close difflib match (token-aware).
CLI command: list available benchmarks, optionally filtered by a query (the gym search entry point).
CLI command: prepare benchmark data.