utils.performance_utils
#
Module Contents#
Classes#
Statistics for tracking stage performance metrics. Attributes: stage_name: Name of the processing stage. process_time: Total processing time in seconds. actor_idle_time: Time the actor spent idle in seconds. input_data_size_mb: Size of input data in megabytes. num_items_processed: Number of items processed in this stage. custom_metrics: Custom metrics to track. |
|
Tracker for stage performance stats. Tracks processing time and other metrics at a per process_data call level. |
API#
- class utils.performance_utils.StagePerfStats#
Statistics for tracking stage performance metrics. Attributes: stage_name: Name of the processing stage. process_time: Total processing time in seconds. actor_idle_time: Time the actor spent idle in seconds. input_data_size_mb: Size of input data in megabytes. num_items_processed: Number of items processed in this stage. custom_metrics: Custom metrics to track.
- actor_idle_time: float#
0.0
- custom_metrics: dict[str, float]#
‘field(…)’
- input_data_size_mb: float#
0.0
- items() list[tuple[str, float | int]] #
Returns (metric_name, metric_value) pairs custom_metrics are flattened into the format (custom.<metric_name>, metric_value)
- num_items_processed: int#
0
- process_time: float#
0.0
- reset() None #
Reset the stats.
- stage_name: str#
None
- to_dict() dict[str, float | int] #
Convert the stats to a dictionary.
- class utils.performance_utils.StageTimer(stage: nemo_curator.stages.base.ProcessingStage)#
Tracker for stage performance stats. Tracks processing time and other metrics at a per process_data call level.
Initialization
Initialize the stage timer. Args: stage: The stage to track.
- log_stats(
- *,
- verbose: bool = False,
Log the stats of the stage. Args: verbose: Whether to log the stats verbosely. Returns: A tuple of the stage name and the stage performance stats.
- reinit(stage_input_size: int = 1) None #
Reinitialize the stage timer. Args: stage: The stage to reinitialize the timer for. stage_input_size: The size of the stage input.
- time_process(
- num_items: int = 1,
Time the processing of the stage. Args: num_items: The number of items being processed.