nemo_curator.tasks.utils
Module Contents
Classes
API
Utilities for aggregating stage performance metrics from tasks.
Example output format: { “StageA”: {“process_time”: np.array([…]), “actor_idle_time”: np.array([…]), “read_time_s”: np.array([…]), …}, “StageB”: {“process_time”: np.array([…]), …} }
Return a mapping of pipeline name -> list of tasks from various input shapes.
Aggregate task metrics by computing mean/std/sum.
Collect per-stage metric lists from tasks or workflow outputs.
The returned mapping aggregates both built-in StagePerfStats metrics and any custom_stats recorded by stages.
Parameters:
Iterable of tasks, a workflow result dictionary, or WorkflowRunResult.
Returns: dict[str, dict[str, np.ndarray[float]]]
Dict mapping stage_name -> metric_name -> list of numeric values.
Get an aggregated stat for stages matching a name prefix.
Sums the performance statistics from all stages whose names start with the given prefix across all tasks.
Parameters:
A list of Task objects, a WorkflowRunResult, or a mapping of pipeline_name -> list[Task]
Match stages whose name starts with this prefix.
The stat to extract (e.g., “num_items_processed”, “process_time”).
Returns: float
The aggregated stat value, or 0.0 if no matches found.