*** layout: overview slug: nemo-curator/nemo\_curator/backends/ray\_data/executor title: nemo\_curator.backends.ray\_data.executor ------------------------------------------------ ## Module Contents ### Classes | Name | Description | | ----------------------------------------------------------------------------- | ----------------------------------------------- | | [`RayDataExecutor`](#nemo_curator-backends-ray_data-executor-RayDataExecutor) | Ray Data-based executor for pipeline execution. | ### API ```python class nemo_curator.backends.ray_data.executor.RayDataExecutor( config: dict[str, typing.Any] | None = None, ignore_head_node: bool = False ) ``` **Bases:** [BaseExecutor](/nemo-curator/nemo_curator/backends/base#nemo_curator-backends-base-BaseExecutor) Ray Data-based executor for pipeline execution. This executor: 1. Executes setup on all nodes for all stages 2. Converts initial tasks to Ray Data dataset 3. Applies each stage as a Ray Data transformation (as a task or actor in map\_batches) 4. Returns final results as a list of tasks ```python nemo_curator.backends.ray_data.executor.RayDataExecutor._dataset_to_tasks( dataset: ray.data.Dataset ) -> list[nemo_curator.tasks.Task] ``` Convert Ray Data dataset back to list of tasks. **Parameters:** Ray Data dataset containing Task objects **Returns:** `list[Task]` List of Task objects ```python nemo_curator.backends.ray_data.executor.RayDataExecutor._tasks_to_dataset( tasks: list[nemo_curator.tasks.Task] ) -> ray.data.Dataset ``` Convert list of tasks to Ray Data dataset. **Parameters:** List of Task objects **Returns:** `Dataset` Ray Data dataset containing Task objects directly ```python nemo_curator.backends.ray_data.executor.RayDataExecutor.execute( stages: list[nemo_curator.stages.base.ProcessingStage], initial_tasks: list[nemo_curator.tasks.Task] | None = None ) -> list[nemo_curator.tasks.Task] ``` Execute the pipeline stages using Ray Data. **Parameters:** List of processing stages to execute Initial tasks to process (can be None for empty start) **Returns:** `list[Task]` list\[Task]: List of final processed tasks