nemo_curator.tasks.file_group

View as Markdown

Module Contents

Classes

NameDescription
FileGroupTaskTask representing a group of files to be read.

API

class nemo_curator.tasks.file_group.FileGroupTask(
task_id: str,
dataset_name: str,
data: list[str] = list(),
_stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
_metadata: dict[str, typing.Any] = dict(),
reader_config: dict[str, typing.Any] = dict()
)
Dataclass

Bases: Task[list[str]]

Task representing a group of files to be read. This is created during the planning phase and passed to reader stages.

data
list[str] = field(default_factory=list)
num_items
int

Number of files in this group.

reader_config
dict[str, Any] = field(default_factory=dict)
nemo_curator.tasks.file_group.FileGroupTask.validate() -> bool

Validate the task data.