nemo_curator.backends.ray_data.executor
Module Contents
Classes
API
Bases: BaseExecutor
Ray Data-based executor for pipeline execution.
This executor:
- Executes setup on all nodes for all stages
- Converts initial tasks to Ray Data dataset
- Applies each stage as a Ray Data transformation (as a task or actor in map_batches)
- Returns final results as a list of tasks
Convert Ray Data dataset back to list of tasks.
Parameters:
dataset
Ray Data dataset containing Task objects
Returns: list[Task]
List of Task objects
Convert list of tasks to Ray Data dataset.
Parameters:
tasks
List of Task objects
Returns: Dataset
Ray Data dataset containing Task objects directly
Execute the pipeline stages using Ray Data.
Parameters:
stages
List of processing stages to execute
initial_tasks
Initial tasks to process (can be None for empty start)
Returns: list[Task]
list[Task]: List of final processed tasks