nemo_rl.evals.eval#

Module Contents#

Classes#

Functions#

setup

Set up components for model evaluation.

run_env_eval

Main entry point for running evaluation using environment.

API#

class nemo_rl.evals.eval.EvalConfig[source]#

Bases: typing.TypedDict

metric: str#

None

num_tests_per_prompt: int#

None

seed: int#

None

class nemo_rl.evals.eval.MasterConfig[source]#

Bases: typing.TypedDict

eval: nemo_rl.evals.eval.EvalConfig#

None

generate: nemo_rl.models.generation.interfaces.GenerationConfig#

None

data: nemo_rl.data.MathDataConfig#

None

env: nemo_rl.environments.math_environment.MathEnvConfig#

None

cluster: nemo_rl.distributed.virtual_cluster.ClusterConfig#

None

nemo_rl.evals.eval.setup(
master_config: nemo_rl.evals.eval.MasterConfig,
tokenizer: transformers.AutoTokenizer,
dataset: nemo_rl.data.datasets.AllTaskProcessedDataset,
) Tuple[nemo_rl.models.generation.vllm.VllmGeneration, torch.utils.data.DataLoader, nemo_rl.evals.eval.MasterConfig][source]#

Set up components for model evaluation.

Initializes the VLLM model and data loader.

Parameters:
  • master_config – Configuration settings.

  • dataset – Dataset to evaluate on.

Returns:

VLLM model, data loader, and config.

nemo_rl.evals.eval.run_env_eval(vllm_generation, dataloader, env, master_config)[source]#

Main entry point for running evaluation using environment.

Generates model responses and evaluates them by env.

Parameters:
  • vllm_generation – Model for generating responses.

  • dataloader – Data loader with evaluation samples.

  • env – Environment that scores responses.

  • master_config – Configuration settings.