nemo_rl.evals.eval
#
Module Contents#
Classes#
Functions#
Set up components for model evaluation. |
|
Evaluate pass@k score using an unbiased estimator. |
|
Main entry point for running evaluation using environment. |
|
Unified implementation for both sync and async evaluation. |
|
Generate texts using either sync or async method. |
|
Print evaluation results. |
API#
- class nemo_rl.evals.eval.EvalConfig[source]#
Bases:
typing.TypedDict
- metric: str#
None
- num_tests_per_prompt: int#
None
- seed: int#
None
- pass_k_value: int#
None
- class nemo_rl.evals.eval.MasterConfig[source]#
Bases:
typing.TypedDict
- eval: nemo_rl.evals.eval.EvalConfig#
None
- generate: nemo_rl.models.generation.interfaces.GenerationConfig#
None
- data: nemo_rl.data.MathDataConfig#
None
- cluster: nemo_rl.distributed.virtual_cluster.ClusterConfig#
None
- nemo_rl.evals.eval.setup(
- master_config: nemo_rl.evals.eval.MasterConfig,
- tokenizer: transformers.AutoTokenizer,
- dataset: nemo_rl.data.datasets.AllTaskProcessedDataset,
Set up components for model evaluation.
Initializes the VLLM model and data loader.
- Parameters:
master_config – Configuration settings.
dataset – Dataset to evaluate on.
- Returns:
VLLM model, data loader, and config.
- nemo_rl.evals.eval.eval_pass_k(
- rewards: torch.Tensor,
- num_tests_per_prompt: int,
- k: int,
Evaluate pass@k score using an unbiased estimator.
Reference: https://github.com/huggingface/evaluate/blob/32546aafec25cdc2a5d7dd9f941fc5be56ba122f/metrics/code_eval/code_eval.py#L198-L213
- Parameters:
rewards – Tensor of shape (batch_size * num_tests_per_prompt)
k – int (pass@k value)
- Returns:
float
- Return type:
pass_k_score
- nemo_rl.evals.eval.run_env_eval(vllm_generation, dataloader, env, master_config)[source]#
Main entry point for running evaluation using environment.
Generates model responses and evaluates them by env.
- Parameters:
vllm_generation – Model for generating responses.
dataloader – Data loader with evaluation samples.
env – Environment that scores responses.
master_config – Configuration settings.
- async nemo_rl.evals.eval._run_env_eval_impl(
- vllm_generation,
- dataloader,
- env,
- master_config,
- use_async=False,
Unified implementation for both sync and async evaluation.