*** layout: overview slug: nemo-curator/nemo\_curator/utils/performance\_utils title: nemo\_curator.utils.performance\_utils --------------------------------------------- ## Module Contents ### Classes | Name | Description | | ------------------------------------------------------------------------ | -------------------------------------------------- | | [`StagePerfStats`](#nemo_curator-utils-performance_utils-StagePerfStats) | Statistics for tracking stage performance metrics. | | [`StageTimer`](#nemo_curator-utils-performance_utils-StageTimer) | Tracker for stage performance stats. | ### API ```python class nemo_curator.utils.performance_utils.StagePerfStats() ``` Statistics for tracking stage performance metrics. Attributes: stage\_name: Name of the processing stage. process\_time: Total processing time in seconds. actor\_idle\_time: Time the actor spent idle in seconds. input\_data\_size\_mb: Size of input data in megabytes. num\_items\_processed: Number of items processed in this stage. custom\_metrics: Custom metrics to track. ```python nemo_curator.utils.performance_utils.StagePerfStats.__add__( other: nemo_curator.utils.performance_utils.StagePerfStats ) -> nemo_curator.utils.performance_utils.StagePerfStats ``` Add two StagePerfStats. ```python nemo_curator.utils.performance_utils.StagePerfStats.__radd__( other: int | nemo_curator.utils.performance_utils.StagePerfStats ) -> nemo_curator.utils.performance_utils.StagePerfStats ``` Add two StagePerfStats together, if right is 0, returns itself. ```python nemo_curator.utils.performance_utils.StagePerfStats.items() -> list[tuple[str, float | int]] ``` Returns (metric\_name, metric\_value) pairs custom\_metrics are flattened into the format (custom.\, metric\_value) ```python nemo_curator.utils.performance_utils.StagePerfStats.reset() -> None ``` Reset the stats. ```python nemo_curator.utils.performance_utils.StagePerfStats.to_dict() -> dict[str, float | int] ``` Convert the stats to a dictionary. ```python class nemo_curator.utils.performance_utils.StageTimer( stage: nemo_curator.stages.base.ProcessingStage ) ``` Tracker for stage performance stats. Tracks processing time and other metrics at a per process\_data call level. ```python nemo_curator.utils.performance_utils.StageTimer._reset() -> None ``` Reset internal counters. ```python nemo_curator.utils.performance_utils.StageTimer.log_stats( verbose: bool = False ) -> tuple[str, nemo_curator.utils.performance_utils.StagePerfStats] ``` Log the stats of the stage. Args: verbose: Whether to log the stats verbosely. Returns: A tuple of the stage name and the stage performance stats. ```python nemo_curator.utils.performance_utils.StageTimer.reinit( stage_input_size: int = 1 ) -> None ``` Reinitialize the stage timer. Args: stage: The stage to reinitialize the timer for. stage\_input\_size: The size of the stage input. ```python nemo_curator.utils.performance_utils.StageTimer.time_process( num_items: int = 1 ) -> collections.abc.Generator[None, None, None] ``` Time the processing of the stage. Args: num\_items: The number of items being processed.