utils.performance_utils#

Module Contents#

Classes#

StagePerfStats

Statistics for tracking stage performance metrics. Attributes: stage_name: Name of the processing stage. process_time: Total processing time in seconds. actor_idle_time: Time the actor spent idle in seconds. input_data_size_mb: Size of input data in megabytes. num_items_processed: Number of items processed in this stage. custom_metrics: Custom metrics to track.

StageTimer

Tracker for stage performance stats. Tracks processing time and other metrics at a per process_data call level.

API#

class utils.performance_utils.StagePerfStats#

Statistics for tracking stage performance metrics. Attributes: stage_name: Name of the processing stage. process_time: Total processing time in seconds. actor_idle_time: Time the actor spent idle in seconds. input_data_size_mb: Size of input data in megabytes. num_items_processed: Number of items processed in this stage. custom_metrics: Custom metrics to track.

actor_idle_time: float#

0.0

custom_metrics: dict[str, float]#

‘field(…)’

input_data_size_mb: float#

0.0

items() list[tuple[str, float | int]]#

Returns (metric_name, metric_value) pairs custom_metrics are flattened into the format (custom.<metric_name>, metric_value)

num_items_processed: int#

0

process_time: float#

0.0

reset() None#

Reset the stats.

stage_name: str#

None

to_dict() dict[str, float | int]#

Convert the stats to a dictionary.

class utils.performance_utils.StageTimer(stage: nemo_curator.stages.base.ProcessingStage)#

Tracker for stage performance stats. Tracks processing time and other metrics at a per process_data call level.

Initialization

Initialize the stage timer. Args: stage: The stage to track.

log_stats(
*,
verbose: bool = False,
) tuple[str, utils.performance_utils.StagePerfStats]#

Log the stats of the stage. Args: verbose: Whether to log the stats verbosely. Returns: A tuple of the stage name and the stage performance stats.

reinit(stage_input_size: int = 1) None#

Reinitialize the stage timer. Args: stage: The stage to reinitialize the timer for. stage_input_size: The size of the stage input.

time_process(
num_items: int = 1,
) collections.abc.Generator[None, None, None]#

Time the processing of the stage. Args: num_items: The number of items being processed.