nemo_gym.cli.eval

Module Contents

Classes

Name	Description
`PrepareBenchmarkConfig`	Prepare benchmark data by running the benchmark’s prepare.py script.

Functions

Name	Description
`_benchmark_extras`	Resolve a benchmark’s config to its `(domain, extra search terms)`.
`_fuzzy_matches`	Whether `query` fuzzily matches any of `fields`: a substring or a close difflib match (token-aware).
`_multiprocess_benchmark_prepare_fn`	-
`aggregate_rollouts`	-
`collect_rollouts`	-
`e2e_rollout_collection`	-
`list_benchmarks`	CLI command: list available benchmarks, optionally filtered by a `query` (the `gym search` entry point).
`prepare_benchmark`	CLI command: prepare benchmark data.
`reward_profile`	-

API

class nemo_gym.cli.eval.PrepareBenchmarkConfig()

Bases: BaseNeMoGymCLIConfig

Prepare benchmark data by running the benchmark’s prepare.py script.

The benchmark is identified from a config_paths entry pointing to a benchmarks/*/config.yaml file.

Examples:

gym eval prepare --benchmark aime24

num_prepare_benchmark_processes

int

use_cached_prepared_benchmarks

bool

nemo_gym.cli.eval._benchmark_extras(
    bench: nemo_gym.benchmarks.BenchmarkConfig
) -> tuple[str, list[str]]

Resolve a benchmark’s config to its (domain, extra search terms).

BenchmarkConfig flattens away the resources server name, the resources server domain, and the dataset names. We re-resolve the config with the same parser BenchmarkConfig uses (so chained config_paths / _inherit_from are applied) and read those fields back out for the domain column and richer gym search matching.

nemo_gym.cli.eval._fuzzy_matches(
    query: str,
    fields: str = ()
) -> bool

Whether query fuzzily matches any of fields: a substring or a close difflib match (token-aware).

nemo_gym.cli.eval._multiprocess_benchmark_prepare_fn(
    args
)

nemo_gym.cli.eval.aggregate_rollouts()

nemo_gym.cli.eval.collect_rollouts()

nemo_gym.cli.eval.e2e_rollout_collection()

nemo_gym.cli.eval.list_benchmarks() -> None

CLI command: list available benchmarks, optionally filtered by a query (the gym search entry point).

nemo_gym.cli.eval.prepare_benchmark() -> None

CLI command: prepare benchmark data.

nemo_gym.cli.eval.reward_profile()