tasks.tasks#

Module Contents#

Classes#

Task

Abstract base class for tasks in the pipeline. A task represents a batch of data to be processed. Different modalities (text, audio, video) can implement their own task types. Attributes: task_id: Unique identifier for this task dataset_name: Name of the dataset this task belongs to dataframe_attribute: Name of the attribute that contains the dataframe data. We use this for input/output validations. _stage_perf: List of stages perfs this task has passed through

Data#

API#

tasks.tasks.EmptyTask#

‘_EmptyTask(…)’

tasks.tasks.T#

‘TypeVar(…)’

class tasks.tasks.Task#

Bases: abc.ABC, typing.Generic[tasks.tasks.T]

Abstract base class for tasks in the pipeline. A task represents a batch of data to be processed. Different modalities (text, audio, video) can implement their own task types. Attributes: task_id: Unique identifier for this task dataset_name: Name of the dataset this task belongs to dataframe_attribute: Name of the attribute that contains the dataframe data. We use this for input/output validations. _stage_perf: List of stages perfs this task has passed through

add_stage_perf(
perf_stats: nemo_curator.utils.performance_utils.StagePerfStats,
) None#

Add performance stats for a stage.

data: tasks.tasks.T#

None

dataset_name: str#

None

abstract property num_items: int#

Get the number of items in this task.

task_id: str#

None

abstractmethod validate() bool#

Validate the task data.