nemo_curator.utils.performance_utils

View as Markdown

Module Contents

Classes

NameDescription
StagePerfStatsStatistics for tracking stage performance metrics.
StageTimerTracker for stage performance stats.

API

class nemo_curator.utils.performance_utils.StagePerfStats()

Statistics for tracking stage performance metrics. Attributes: stage_name: Name of the processing stage. process_time: Total processing time in seconds. actor_idle_time: Time the actor spent idle in seconds. input_data_size_mb: Size of input data in megabytes. num_items_processed: Number of items processed in this stage. custom_metrics: Custom metrics to track.

actor_idle_time
float = 0.0
custom_metrics
dict[str, float] = attrs.field(factory=dict)
input_data_size_mb
float = 0.0
num_items_processed
int = 0
process_time
float = 0.0
stage_name
str

Add two StagePerfStats.

Add two StagePerfStats together, if right is 0, returns itself.

nemo_curator.utils.performance_utils.StagePerfStats.items() -> list[tuple[str, float | int]]

Returns (metric_name, metric_value) pairs custom_metrics are flattened into the format (custom.<metric_name>, metric_value)

nemo_curator.utils.performance_utils.StagePerfStats.reset() -> None

Reset the stats.

nemo_curator.utils.performance_utils.StagePerfStats.to_dict() -> dict[str, float | int]

Convert the stats to a dictionary.

class nemo_curator.utils.performance_utils.StageTimer(
stage: nemo_curator.stages.base.ProcessingStage
)

Tracker for stage performance stats. Tracks processing time and other metrics at a per process_data call level.

_last_active_time
= time.time()
_stage_name
= str(stage.name)
nemo_curator.utils.performance_utils.StageTimer._reset() -> None

Reset internal counters.

nemo_curator.utils.performance_utils.StageTimer.log_stats(
verbose: bool = False
) -> tuple[str, nemo_curator.utils.performance_utils.StagePerfStats]

Log the stats of the stage. Args: verbose: Whether to log the stats verbosely. Returns: A tuple of the stage name and the stage performance stats.

nemo_curator.utils.performance_utils.StageTimer.reinit(
stage_input_size: int = 1
) -> None

Reinitialize the stage timer. Args: stage: The stage to reinitialize the timer for. stage_input_size: The size of the stage input.

nemo_curator.utils.performance_utils.StageTimer.time_process(
num_items: int = 1
) -> collections.abc.Generator[None, None, None]

Time the processing of the stage. Args: num_items: The number of items being processed.