***

layout: overview
slug: nemo-curator/nemo\_curator/backends/ray\_data/executor
title: nemo\_curator.backends.ray\_data.executor
------------------------------------------------

## Module Contents

### Classes

| Name                                                                          | Description                                     |
| ----------------------------------------------------------------------------- | ----------------------------------------------- |
| [`RayDataExecutor`](#nemo_curator-backends-ray_data-executor-RayDataExecutor) | Ray Data-based executor for pipeline execution. |

### API

<Anchor id="nemo_curator-backends-ray_data-executor-RayDataExecutor">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.backends.ray_data.executor.RayDataExecutor(
        config: dict[str, typing.Any] | None = None,
        ignore_head_node: bool = False
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  **Bases:** [BaseExecutor](/nemo-curator/nemo_curator/backends/base#nemo_curator-backends-base-BaseExecutor)

  Ray Data-based executor for pipeline execution.

  This executor:

  1. Executes setup on all nodes for all stages
  2. Converts initial tasks to Ray Data dataset
  3. Applies each stage as a Ray Data transformation (as a task or actor in map\_batches)
  4. Returns final results as a list of tasks

  <Anchor id="nemo_curator-backends-ray_data-executor-RayDataExecutor-_dataset_to_tasks">
    <CodeBlock links={{"nemo_curator.tasks.Task":"/nemo-curator/nemo_curator/tasks/tasks#nemo_curator-tasks-tasks-Task"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.backends.ray_data.executor.RayDataExecutor._dataset_to_tasks(
          dataset: ray.data.Dataset
      ) -> list[nemo_curator.tasks.Task]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Convert Ray Data dataset back to list of tasks.

    **Parameters:**

    <ParamField path="dataset" type="Dataset">
      Ray Data dataset containing Task objects
    </ParamField>

    **Returns:** `list[Task]`

    List of Task objects
  </Indent>

  <Anchor id="nemo_curator-backends-ray_data-executor-RayDataExecutor-_tasks_to_dataset">
    <CodeBlock links={{"nemo_curator.tasks.Task":"/nemo-curator/nemo_curator/tasks/tasks#nemo_curator-tasks-tasks-Task"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.backends.ray_data.executor.RayDataExecutor._tasks_to_dataset(
          tasks: list[nemo_curator.tasks.Task]
      ) -> ray.data.Dataset
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Convert list of tasks to Ray Data dataset.

    **Parameters:**

    <ParamField path="tasks" type="list[Task]">
      List of Task objects
    </ParamField>

    **Returns:** `Dataset`

    Ray Data dataset containing Task objects directly
  </Indent>

  <Anchor id="nemo_curator-backends-ray_data-executor-RayDataExecutor-execute">
    <CodeBlock links={{"nemo_curator.stages.base.ProcessingStage":"/nemo-curator/nemo_curator/stages/base#nemo_curator-stages-base-ProcessingStage","nemo_curator.tasks.Task":"/nemo-curator/nemo_curator/tasks/tasks#nemo_curator-tasks-tasks-Task"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.backends.ray_data.executor.RayDataExecutor.execute(
          stages: list[nemo_curator.stages.base.ProcessingStage],
          initial_tasks: list[nemo_curator.tasks.Task] | None = None
      ) -> list[nemo_curator.tasks.Task]
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Execute the pipeline stages using Ray Data.

    **Parameters:**

    <ParamField path="stages" type="list[ProcessingStage]">
      List of processing stages to execute
    </ParamField>

    <ParamField path="initial_tasks" type="list[Task]" default="None">
      Initial tasks to process (can be None for empty start)
    </ParamField>

    **Returns:** `list[Task]`

    list\[Task]: List of final processed tasks
  </Indent>
</Indent>
