nemo_gym.cli.eval

View as Markdown

Module Contents

Classes

NameDescription
PrepareBenchmarkConfigPrepare benchmark data by running the benchmark’s prepare.py script.

Functions

NameDescription
_benchmark_domainResolve a benchmark’s config to its domain (for the domain column and gym search).
_fuzzy_matchesWhether query fuzzily matches any of fields: a substring or a close difflib match (token-aware).
_multiprocess_benchmark_prepare_fn-
aggregate_rollouts-
collect_rollouts-
e2e_rollout_collection-
list_benchmarksCLI command: list available benchmarks, optionally filtered by a query (the gym search entry point).
prepare_benchmarkCLI command: prepare benchmark data.
reward_profile-

API

class nemo_gym.cli.eval.PrepareBenchmarkConfig()

Bases: BaseNeMoGymCLIConfig

Prepare benchmark data by running the benchmark’s prepare.py script.

The benchmark is identified from a config_paths entry pointing to a benchmarks/*/config.yaml file.

Examples:

gym eval prepare --benchmark aime24
num_prepare_benchmark_processes
int
prepare_script_args
Dict[str, Any]
use_cached_prepared_benchmarks
bool
nemo_gym.cli.eval._benchmark_domain(
bench: nemo_gym.benchmarks.BenchmarkConfig
) -> str

Resolve a benchmark’s config to its domain (for the domain column and gym search).

BenchmarkConfig flattens away the domain, so we re-resolve the config with the same parser BenchmarkConfig uses (so chained config_paths / _inherit_from are applied) and read the field back out. domain may be declared on any server config — a resources server (e.g. aime24) or an agent (e.g. tau2) — so we scan every server group.

nemo_gym.cli.eval._fuzzy_matches(
query: str,
fields: str = ()
) -> bool

Whether query fuzzily matches any of fields: a substring or a close difflib match (token-aware).

nemo_gym.cli.eval._multiprocess_benchmark_prepare_fn(
args
)
nemo_gym.cli.eval.aggregate_rollouts()
nemo_gym.cli.eval.collect_rollouts()
nemo_gym.cli.eval.e2e_rollout_collection()
nemo_gym.cli.eval.list_benchmarks() -> None

CLI command: list available benchmarks, optionally filtered by a query (the gym search entry point).

nemo_gym.cli.eval.prepare_benchmark() -> None

CLI command: prepare benchmark data.

nemo_gym.cli.eval.reward_profile()