nemo_curator.tasks.tasks

Module Contents

Classes

Name	Description
`Task`	Abstract base class for tasks in the pipeline.
`_EmptyTask`	Dummy task for testing.

Data

EmptyTask

T

API

class nemo_curator.tasks.tasks.Task(
    task_id: str,
    dataset_name: str,
    data: nemo_curator.tasks.tasks.T,
    _stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
    _metadata: dict[str, typing.Any] = dict()
)

DataclassAbstract

Bases: Generic[T]

Abstract base class for tasks in the pipeline. A task represents a batch of data to be processed. Different modalities (text, audio, video) can implement their own task types. Attributes: task_id: Unique identifier for this task dataset_name: Name of the dataset this task belongs to dataframe_attribute: Name of the attribute that contains the dataframe data. We use this for input/output validations. _stage_perf: List of stages perfs this task has passed through

_metadata

dict[str, Any] = field(default_factory=dict)

_stage_perf

list[StagePerfStats] = field(default_factory=list)

_uuid

str

data

dataset_name

str

num_items

int

Get the number of items in this task.

task_id

str

nemo_curator.tasks.tasks.Task.__post_init__() -> None

Post-initialization hook.

nemo_curator.tasks.tasks.Task.__repr__() -> str

nemo_curator.tasks.tasks.Task.add_stage_perf(
    perf_stats: nemo_curator.utils.performance_utils.StagePerfStats
) -> None

Add performance stats for a stage.

nemo_curator.tasks.tasks.Task.validate() -> bool

abstract

Validate the task data.

class nemo_curator.tasks.tasks._EmptyTask(
    task_id: str,
    dataset_name: str,
    data: nemo_curator.tasks.tasks.T,
    _stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
    _metadata: dict[str, typing.Any] = dict()
)

Dataclass

Bases: Task[None]

Dummy task for testing.

num_items

int

nemo_curator.tasks.tasks._EmptyTask.validate() -> bool

Validate the task data.

nemo_curator.tasks.tasks.EmptyTask = _EmptyTask(task_id='empty', dataset_name='empty', data=None)

nemo_curator.tasks.tasks.T = TypeVar('T')