nemo_curator.tasks.tasks
Module Contents
Classes
Data
API
DataclassAbstract
Bases: Generic[T]
Abstract base class for tasks in the pipeline. A task represents a batch of data to be processed. Different modalities (text, audio, video) can implement their own task types. Attributes: task_id: Unique identifier for this task dataset_name: Name of the dataset this task belongs to dataframe_attribute: Name of the attribute that contains the dataframe data. We use this for input/output validations. _stage_perf: List of stages perfs this task has passed through
_metadata
_stage_perf
_uuid
data
dataset_name
num_items
Get the number of items in this task.
task_id
Post-initialization hook.
Add performance stats for a stage.
abstract
Validate the task data.