> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/gym/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/gym/llms-full.txt.

# nemo_gym.reward_profile

## Module Contents

### Classes

| Name                                                                      | Description                                                                                   |
| ------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| [`AggregateMetricsMixin`](#nemo_gym-reward_profile-AggregateMetricsMixin) | Mixin providing compute\_metrics/get\_key\_metrics hooks and the aggregate\_metrics endpoint. |
| [`RewardProfileConfig`](#nemo_gym-reward_profile-RewardProfileConfig)     | -                                                                                             |
| [`RewardProfiler`](#nemo_gym-reward_profile-RewardProfiler)               | -                                                                                             |

### Functions

| Name                                                                                      | Description                                                                                  |
| ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| [`_group_by_task`](#nemo_gym-reward_profile-_group_by_task)                               | Group verify responses by task index, returning a list of per-task rollout lists.            |
| [`_rollout_key`](#nemo_gym-reward_profile-_rollout_key)                                   | -                                                                                            |
| [`add_avg_sample_std_dev`](#nemo_gym-reward_profile-add_avg_sample_std_dev)               | Add avg\_sample\_std\_dev statistics to an existing metrics dict.                            |
| [`compute_aggregate_metrics`](#nemo_gym-reward_profile-compute_aggregate_metrics)         | Shared aggregation logic for /aggregate\_metrics.                                            |
| [`compute_pass_majority_metrics`](#nemo_gym-reward_profile-compute_pass_majority_metrics) | Compute pass\@k, majority\@k, no\_answer, and variance statistics from grouped task results. |
| [`compute_subset_metrics`](#nemo_gym-reward_profile-compute_subset_metrics)               | Group tasks by a field and compute pass\@k metrics per subset.                               |
| [`highest_k_metrics`](#nemo_gym-reward_profile-highest_k_metrics)                         | Select the highest-k entries matching a metric pattern.                                      |
| [`reward_profile`](#nemo_gym-reward_profile-reward_profile)                               | -                                                                                            |

### API

<Anchor id="nemo_gym-reward_profile-AggregateMetricsMixin">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_gym.reward_profile.AggregateMetricsMixin()
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Mixin providing compute\_metrics/get\_key\_metrics hooks and the aggregate\_metrics endpoint.

  Inherited by both SimpleResourcesServer and SimpleResponsesAPIAgent so that
  benchmark-specific metric logic can live on either server type.

  <Anchor id="nemo_gym-reward_profile-AggregateMetricsMixin-compute_metrics">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.AggregateMetricsMixin.compute_metrics(
          tasks: typing.List[typing.List[typing.Dict[str, typing.Any]]]
      ) -> typing.Dict[str, typing.Any]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Override to compute custom metrics from all verify responses.

    Receives verify responses grouped by task: tasks\[i] is a list of rollout
    dicts for task i. Each dict has at minimum reward, plus any custom fields
    from the verify response (e.g. symbolic\_correct, judgement-gen-base).

    Use for metrics that need the full dataset at once:

    * Confidence intervals (ArenaMetrics)
    * Cross-task statistics (std\_dev\_across\_runs)
    * pass\@k with proper combinatorial computation

    The returned dict is merged into agent\_metrics.
    Default: empty dict (no additional metrics).
  </Indent>

  <Anchor id="nemo_gym-reward_profile-AggregateMetricsMixin-get_key_metrics">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.AggregateMetricsMixin.get_key_metrics(
          agent_metrics: typing.Dict[str, typing.Any]
      ) -> typing.Dict[str, typing.Any]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Override to select headline metrics for this benchmark.

    Default: all mean/\* entries from agent\_metrics.
  </Indent>
</Indent>

<Anchor id="nemo_gym-reward_profile-RewardProfileConfig">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_gym.reward_profile.RewardProfileConfig()
    ```
  </CodeBlock>
</Anchor>

<Indent>
  **Bases:** [BaseNeMoGymCLIConfig](/nemo-gym/nemo_gym/config_types#nemo_gym-config_types-BaseNeMoGymCLIConfig)

  <ParamField path="allow_partial_rollouts" type="bool" />

  <ParamField path="materialized_inputs_jsonl_fpath" type="str" />

  <ParamField path="rollouts_jsonl_fpath" type="str" />
</Indent>

<Anchor id="nemo_gym-reward_profile-RewardProfiler">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_gym.reward_profile.RewardProfiler()
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Anchor id="nemo_gym-reward_profile-RewardProfiler-_index_by_rollout_key">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler._index_by_rollout_key(
          rows: typing.List[typing.Dict[str, typing.Any]],
          name: str
      ) -> typing.Dict[typing.Tuple[int, int], typing.Dict[str, typing.Any]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-align_rows_and_results">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.align_rows_and_results(
          rows: typing.List[typing.Dict[str, typing.Any]],
          results: typing.List[typing.Dict[str, typing.Any]],
          allow_partial_rollouts: bool = False
      ) -> typing.List[typing.Tuple[typing.Dict[str, typing.Any], typing.Dict[str, typing.Any]]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-calculate_metrics_single_df">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.calculate_metrics_single_df(
          grouped_df: pandas.core.groupby.generic.DataFrameGroupBy
      ) -> typing.List[typing.Dict[str, typing.Any]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-describe_dataframe">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.describe_dataframe(
          df: pandas.DataFrame
      ) -> pandas.DataFrame
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-histogram">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.histogram(
          data: pandas.Series
      ) -> typing.Optional[wandb.Histogram]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-prepare_for_serialization">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.prepare_for_serialization(
          metrics: typing.List[typing.Dict]
      ) -> typing.List[typing.Dict]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Non-destructively cleans metrics output by RewardProfiler for downstream serialization.
  </Indent>

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-profile_completion_summary">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.profile_completion_summary(
          rows: typing.List[typing.Dict[str, typing.Any]],
          results: typing.List[typing.Dict[str, typing.Any]]
      ) -> typing.Dict[str, typing.Any]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-profile_from_data">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.profile_from_data(
          rows: typing.List[typing.Dict[str, typing.Any]],
          results: typing.List[typing.Dict[str, typing.Any]],
          allow_partial_rollouts: bool = False
      ) -> typing.Tuple[typing.List[typing.Dict[str, typing.Any]], typing.List[typing.Dict[str, typing.Any]]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-rollout_info_from_result">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.rollout_info_from_result(
          result: typing.Dict[str, typing.Any]
      ) -> typing.Dict[str, typing.Any]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_gym-reward_profile-RewardProfiler-write_to_disk">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_gym.reward_profile.RewardProfiler.write_to_disk(
          group_level_metrics: typing.List[typing.Dict[str, typing.Any]],
          agent_level_metrics: typing.List[typing.Dict[str, typing.Any]],
          base_output_fpath: pathlib.Path
      ) -> typing.Tuple[pathlib.Path, pathlib.Path]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />
</Indent>

<Anchor id="nemo_gym-reward_profile-_group_by_task">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile._group_by_task(
        verify_responses: typing.List[typing.Dict[str, typing.Any]]
    ) -> typing.List[typing.List[typing.Dict[str, typing.Any]]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Group verify responses by task index, returning a list of per-task rollout lists.
</Indent>

<Anchor id="nemo_gym-reward_profile-_rollout_key">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile._rollout_key(
        row: typing.Dict[str, typing.Any]
    ) -> typing.Tuple[int, int]
    ```
  </CodeBlock>
</Anchor>

<Indent />

<Anchor id="nemo_gym-reward_profile-add_avg_sample_std_dev">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile.add_avg_sample_std_dev(
        metrics: typing.Dict[str, typing.Any],
        all_score_dicts: typing.List[typing.List[typing.Dict[str, float]]],
        score_names: list,
        max_k: int
    ) -> None
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Add avg\_sample\_std\_dev statistics to an existing metrics dict.

  Computes the average of per-task standard deviations across k rollouts — a measure of
  within-task variance that complements the across-run variance (std\_dev\_across\_runs).

  Modifies `metrics` in place.
</Indent>

<Anchor id="nemo_gym-reward_profile-compute_aggregate_metrics">
  <CodeBlock links={{"nemo_gym.config_types.AggregateMetrics":"/nemo-gym/nemo_gym/config_types#nemo_gym-config_types-AggregateMetrics"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile.compute_aggregate_metrics(
        verify_responses: typing.List[typing.Dict[str, typing.Any]],
        compute_metrics_fn = None,
        get_key_metrics_fn = None
    ) -> nemo_gym.config_types.AggregateMetrics
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Shared aggregation logic for /aggregate\_metrics.

  RewardProfiler runs with defaults to produce baseline stats (mean/max/min/median/std)
  for both group-level (per-task) and agent-level metrics.
</Indent>

<Anchor id="nemo_gym-reward_profile-compute_pass_majority_metrics">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile.compute_pass_majority_metrics(
        tasks: typing.List[typing.List[typing.Dict[str, typing.Any]]],
        score_fn: typing.Optional[typing.Any] = None,
        answer_key: typing.Optional[str] = None
    ) -> typing.Tuple[typing.Dict[str, typing.Any], typing.List[typing.List[typing.Dict[str, float]]], typing.List[str], int]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Compute pass\@k, majority\@k, no\_answer, and variance statistics from grouped task results.

  Shared utility for any resource server's compute\_metrics() override.

  **Parameters:**

  <ParamField path="tasks" type="List[List[Dict[str, Any]]]">
    tasks\[i] is a list of rollout dicts for task i.
  </ParamField>

  <ParamField path="score_fn" type="Optional[Any]" default="None">
    Callable(result\_dict) -> Dict\[str, float|bool] returning named scores.
    Defaults to `lambda r: &#123;"accuracy": r["reward"]&#125;`.
  </ParamField>

  <ParamField path="answer_key" type="Optional[str]" default="None">
    Field name for extracted answer (enables majority\@k and no\_answer).
    If None, majority\@k and no\_answer are skipped.
  </ParamField>

  **Returns:** `Dict[str, Any]`

  Metrics, all\_score\_dicts, score\_names, max\_k
</Indent>

<Anchor id="nemo_gym-reward_profile-compute_subset_metrics">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile.compute_subset_metrics(
        tasks: typing.List[typing.List[typing.Dict[str, typing.Any]]],
        subset_key: str,
        score_fn: typing.Optional[typing.Any] = None,
        answer_key: typing.Optional[str] = None
    ) -> typing.Dict[str, typing.Any]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Group tasks by a field and compute pass\@k metrics per subset.

  Returns flat dict with subset-prefixed keys, e.g. `"easy/pass@1/accuracy"`.
  Skips the `per_sample_aggregate` key from each subset's metrics.

  **Parameters:**

  <ParamField path="tasks" type="List[List[Dict[str, Any]]]">
    tasks\[i] is a list of rollout dicts for task i.
  </ParamField>

  <ParamField path="subset_key" type="str">
    Field name in rollout dicts to group by (e.g. `"difficulty"`).
  </ParamField>

  <ParamField path="score_fn" type="Optional[Any]" default="None">
    Passed through to `compute_pass_majority_metrics`.
  </ParamField>

  <ParamField path="answer_key" type="Optional[str]" default="None">
    Passed through to `compute_pass_majority_metrics`.
  </ParamField>
</Indent>

<Anchor id="nemo_gym-reward_profile-highest_k_metrics">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile.highest_k_metrics(
        agent_metrics: typing.Dict[str, typing.Any],
        pattern: str,
        score_names: typing.Optional[typing.List[str]] = None,
        exclude_names: typing.Optional[typing.List[str]] = None
    ) -> typing.Dict[str, typing.Any]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Select the highest-k entries matching a metric pattern.

  Finds all keys matching `pattern` (with `&#123;k&#125;` as the k placeholder), determines the
  highest k value, and returns all entries at that k.

  Example::

  # Get highest-k pass\@k for accuracy only

  highest\_k\_metrics(am, "pass@\{k}", score\_names=\["accuracy"])

  # → \{"pass\@32/accuracy": 95.0}

  # Get highest-k pass\@1\[avg-of-k] for all scores except no\_answer, without stats

  highest\_k\_metrics(am, "pass\@1\[avg-of-\{k}]", exclude\_names=\["no\_answer"])

  # → \{"pass\@1\[avg-of-32]/accuracy": 94.5, "pass\@1\[avg-of-32]/symbolic\_accuracy": 93.2}

  **Parameters:**

  <ParamField path="agent_metrics" type="Dict[str, Any]">
    Full agent metrics dict.
  </ParamField>

  <ParamField path="pattern" type="str">
    Pattern with `&#123;k&#125;` placeholder, e.g. `"pass@&#123;k&#125;"` or `"pass@1[avg-of-&#123;k&#125;]"`.
  </ParamField>

  <ParamField path="score_names" type="Optional[List[str]]" default="None">
    If provided, only return entries whose score name (after the last `/`)
    is in this list. Stat suffixes (std\_dev, std\_err, avg\_sample) are always excluded.
  </ParamField>

  <ParamField path="exclude_names" type="Optional[List[str]]" default="None">
    Score names to exclude (e.g. `["no_answer"]`). Applied after score\_names.
  </ParamField>

  **Returns:** `Dict[str, Any]`

  Dict of matching metrics at the highest k, e.g. `&#123;"pass@32/accuracy": 95.0&#125;`.
</Indent>

<Anchor id="nemo_gym-reward_profile-reward_profile">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_gym.reward_profile.reward_profile()
    ```
  </CodeBlock>
</Anchor>

<Indent />