nemo_gym.cli.eval

View as Markdown

Module Contents

Classes

NameDescription
PrepareBenchmarkConfigPrepare benchmark data by running the benchmark’s prepare.py script.

Functions

NameDescription
_benchmark_extrasResolve a benchmark’s config to its (domain, extra search terms).
_fuzzy_matchesWhether query fuzzily matches any of fields: a substring or a close difflib match (token-aware).
_multiprocess_benchmark_prepare_fn-
aggregate_rollouts-
collect_rollouts-
e2e_rollout_collection-
list_benchmarksCLI command: list available benchmarks, optionally filtered by a query (the gym search entry point).
prepare_benchmarkCLI command: prepare benchmark data.
reward_profile-

API

class nemo_gym.cli.eval.PrepareBenchmarkConfig()

Bases: BaseNeMoGymCLIConfig

Prepare benchmark data by running the benchmark’s prepare.py script.

The benchmark is identified from a config_paths entry pointing to a benchmarks/*/config.yaml file.

Examples:

gym eval prepare --benchmark aime24
num_prepare_benchmark_processes
int
use_cached_prepared_benchmarks
bool
nemo_gym.cli.eval._benchmark_extras(
bench: nemo_gym.benchmarks.BenchmarkConfig
) -> tuple[str, list[str]]

Resolve a benchmark’s config to its (domain, extra search terms).

BenchmarkConfig flattens away the resources server name, the resources server domain, and the dataset names. We re-resolve the config with the same parser BenchmarkConfig uses (so chained config_paths / _inherit_from are applied) and read those fields back out for the domain column and richer gym search matching.

nemo_gym.cli.eval._fuzzy_matches(
query: str,
fields: str = ()
) -> bool

Whether query fuzzily matches any of fields: a substring or a close difflib match (token-aware).

nemo_gym.cli.eval._multiprocess_benchmark_prepare_fn(
args
)
nemo_gym.cli.eval.aggregate_rollouts()
nemo_gym.cli.eval.collect_rollouts()
nemo_gym.cli.eval.e2e_rollout_collection()
nemo_gym.cli.eval.list_benchmarks() -> None

CLI command: list available benchmarks, optionally filtered by a query (the gym search entry point).

nemo_gym.cli.eval.prepare_benchmark() -> None

CLI command: prepare benchmark data.

nemo_gym.cli.eval.reward_profile()