> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

# nemo_curator.backends.ray_data.executor

## Module Contents

### Classes

| Name                                                                          | Description                                     |
| ----------------------------------------------------------------------------- | ----------------------------------------------- |
| [`RayDataExecutor`](#nemo_curator-backends-ray_data-executor-RayDataExecutor) | Ray Data-based executor for pipeline execution. |

### API

```python
class nemo_curator.backends.ray_data.executor.RayDataExecutor(
    config: dict[str, typing.Any] | None = None,
    ignore_head_node: bool = False
)
```

**Bases:** [BaseExecutor](/nemo-curator/nemo_curator/backends/base#nemo_curator-backends-base-BaseExecutor)

Ray Data-based executor for pipeline execution.

This executor:

1. Executes setup on all nodes for all stages
2. Converts initial tasks to Ray Data dataset
3. Applies each stage as a Ray Data transformation (as a task or actor in map\_batches)
4. Returns final results as a list of tasks

```python
nemo_curator.backends.ray_data.executor.RayDataExecutor._dataset_to_tasks(
    dataset: ray.data.Dataset
) -> list[nemo_curator.tasks.Task]
```

Convert Ray Data dataset back to list of tasks.

**Parameters:**

Ray Data dataset containing Task objects

**Returns:** `list[Task]`

List of Task objects

```python
nemo_curator.backends.ray_data.executor.RayDataExecutor._tasks_to_dataset(
    tasks: list[nemo_curator.tasks.Task]
) -> ray.data.Dataset
```

Convert list of tasks to Ray Data dataset.

**Parameters:**

List of Task objects

**Returns:** `Dataset`

Ray Data dataset containing Task objects directly

```python
nemo_curator.backends.ray_data.executor.RayDataExecutor.execute(
    stages: list[nemo_curator.stages.base.ProcessingStage],
    initial_tasks: list[nemo_curator.tasks.Task] | None = None
) -> list[nemo_curator.tasks.Task]
```

Execute the pipeline stages using Ray Data.

**Parameters:**

List of processing stages to execute

Initial tasks to process (can be None for empty start)

**Returns:** `list[Task]`

list\[Task]: List of final processed tasks