backends.experimental.ray_data.executor
#
Module Contents#
Classes#
Ray Data-based executor for pipeline execution. |
API#
- class backends.experimental.ray_data.executor.RayDataExecutor(config: dict[str, Any] | None = None)#
Bases:
nemo_curator.backends.base.BaseExecutor
Ray Data-based executor for pipeline execution.
This executor:
Executes setup on all nodes for all stages
Converts initial tasks to Ray Data dataset
Applies each stage as a Ray Data transformation (as a task or actor in map_batches)
Returns final results as a list of tasks
Initialization
- execute(
- stages: list[nemo_curator.stages.base.ProcessingStage],
- initial_tasks: list[nemo_curator.tasks.Task] | None = None,
Execute the pipeline stages using Ray Data.
Args: stages (list[ProcessingStage]): List of processing stages to execute initial_tasks (list[Task], optional): Initial tasks to process (can be None for empty start)
Returns: list[Task]: List of final processed tasks