***

layout: overview
slug: nemo-curator/nemo\_curator/utils/performance\_utils
title: nemo\_curator.utils.performance\_utils
---------------------------------------------

## Module Contents

### Classes

| Name                                                                     | Description                                        |
| ------------------------------------------------------------------------ | -------------------------------------------------- |
| [`StagePerfStats`](#nemo_curator-utils-performance_utils-StagePerfStats) | Statistics for tracking stage performance metrics. |
| [`StageTimer`](#nemo_curator-utils-performance_utils-StageTimer)         | Tracker for stage performance stats.               |

### API

<Anchor id="nemo_curator-utils-performance_utils-StagePerfStats">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.performance_utils.StagePerfStats()
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Statistics for tracking stage performance metrics.
  Attributes:
  stage\_name: Name of the processing stage.
  process\_time: Total processing time in seconds.
  actor\_idle\_time: Time the actor spent idle in seconds.
  input\_data\_size\_mb: Size of input data in megabytes.
  num\_items\_processed: Number of items processed in this stage.
  custom\_metrics: Custom metrics to track.

  <ParamField path="actor_idle_time" type="float = 0.0" />

  <ParamField path="custom_metrics" type="dict[str, float] = attrs.field(factory=dict)" />

  <ParamField path="input_data_size_mb" type="float = 0.0" />

  <ParamField path="num_items_processed" type="int = 0" />

  <ParamField path="process_time" type="float = 0.0" />

  <ParamField path="stage_name" type="str" />

  <Anchor id="nemo_curator-utils-performance_utils-StagePerfStats-__add__">
    <CodeBlock links={{"nemo_curator.utils.performance_utils.StagePerfStats":"#nemo_curator-utils-performance_utils-StagePerfStats"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StagePerfStats.__add__(
          other: nemo_curator.utils.performance_utils.StagePerfStats
      ) -> nemo_curator.utils.performance_utils.StagePerfStats
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Add two StagePerfStats.
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StagePerfStats-__radd__">
    <CodeBlock links={{"nemo_curator.utils.performance_utils.StagePerfStats":"#nemo_curator-utils-performance_utils-StagePerfStats"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StagePerfStats.__radd__(
          other: int | nemo_curator.utils.performance_utils.StagePerfStats
      ) -> nemo_curator.utils.performance_utils.StagePerfStats
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Add two StagePerfStats together, if right is 0, returns itself.
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StagePerfStats-items">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StagePerfStats.items() -> list[tuple[str, float | int]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Returns (metric\_name, metric\_value) pairs
    custom\_metrics are flattened into the format (custom.\<metric\_name>, metric\_value)
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StagePerfStats-reset">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StagePerfStats.reset() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Reset the stats.
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StagePerfStats-to_dict">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StagePerfStats.to_dict() -> dict[str, float | int]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Convert the stats to a dictionary.
  </Indent>
</Indent>

<Anchor id="nemo_curator-utils-performance_utils-StageTimer">
  <CodeBlock links={{"nemo_curator.stages.base.ProcessingStage":"/nemo-curator/nemo_curator/stages/base#nemo_curator-stages-base-ProcessingStage"}} showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.performance_utils.StageTimer(
        stage: nemo_curator.stages.base.ProcessingStage
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Tracker for stage performance stats.
  Tracks processing time and other metrics at a per process\_data call level.

  <ParamField path="_last_active_time" type="= time.time()" />

  <ParamField path="_stage_name" type="= str(stage.name)" />

  <Anchor id="nemo_curator-utils-performance_utils-StageTimer-_reset">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StageTimer._reset() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Reset internal counters.
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StageTimer-log_stats">
    <CodeBlock links={{"nemo_curator.utils.performance_utils.StagePerfStats":"#nemo_curator-utils-performance_utils-StagePerfStats"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StageTimer.log_stats(
          verbose: bool = False
      ) -> tuple[str, nemo_curator.utils.performance_utils.StagePerfStats]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Log the stats of the stage.
    Args:
    verbose: Whether to log the stats verbosely.
    Returns:
    A tuple of the stage name and the stage performance stats.
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StageTimer-reinit">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StageTimer.reinit(
          stage_input_size: int = 1
      ) -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Reinitialize the stage timer.
    Args:
    stage: The stage to reinitialize the timer for.
    stage\_input\_size: The size of the stage input.
  </Indent>

  <Anchor id="nemo_curator-utils-performance_utils-StageTimer-time_process">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.performance_utils.StageTimer.time_process(
          num_items: int = 1
      ) -> collections.abc.Generator[None, None, None]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Time the processing of the stage.
    Args:
    num\_items: The number of items being processed.
  </Indent>
</Indent>
