stages.client_partitioning
#
Module Contents#
Classes#
Stage that partitions input file paths from a client into FileGroupTasks. |
API#
- class stages.client_partitioning.ClientPartitioningStage#
Bases:
nemo_curator.stages.file_partitioning.FilePartitioningStage
Stage that partitions input file paths from a client into FileGroupTasks.
This stage runs as a dedicated processing stage (not on the driver) and creates file groups based on the partitioning strategy.
- input_list_json_path: str | None#
None
- process(
- _: nemo_curator.tasks._EmptyTask,
Process the initial task to create file group tasks.
This stage expects a simple Task with file paths information and outputs multiple FileGroupTasks for parallel processing.
- setup(
- worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None,
Setup method called once before processing begins. Override this method to perform any initialization that should happen once per worker. Args: worker_metadata (WorkerMetadata, optional): Information about the worker (provided by some backends)