For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • API Reference
    • Overview
        • Nemo Curator
          • Backends
            • Base
            • Internal
            • Ray Actor Pool
            • Ray Data
              • Adapter
              • Executor
              • Utils
            • Utils
            • Xenna
          • Config
          • Core
          • Metrics
          • Models
          • Package Info
          • Pipeline
          • Stages
          • Tasks
          • Utils
    • Pipeline
    • ProcessingStage
    • CompositeStage
    • Resources
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Module Contents
  • Classes
  • API
API ReferenceFull Library ReferenceNemo CuratorNemo CuratorBackendsRay Data

nemo_curator.backends.ray_data.executor

||View as Markdown|
Previous

nemo_curator.backends.ray_data.adapter

Next

nemo_curator.backends.ray_data.utils

Module Contents

Classes

NameDescription
RayDataExecutorRay Data-based executor for pipeline execution.

API

class nemo_curator.backends.ray_data.executor.RayDataExecutor(
config: dict[str, typing.Any] | None = None,
ignore_head_node: bool = False
)

Bases: BaseExecutor

Ray Data-based executor for pipeline execution.

This executor:

  1. Executes setup on all nodes for all stages
  2. Converts initial tasks to Ray Data dataset
  3. Applies each stage as a Ray Data transformation (as a task or actor in map_batches)
  4. Returns final results as a list of tasks
nemo_curator.backends.ray_data.executor.RayDataExecutor._dataset_to_tasks(
dataset: ray.data.Dataset
) -> list[nemo_curator.tasks.Task]

Convert Ray Data dataset back to list of tasks.

Parameters:

dataset
Dataset

Ray Data dataset containing Task objects

Returns: list[Task]

List of Task objects

nemo_curator.backends.ray_data.executor.RayDataExecutor._tasks_to_dataset(
tasks: list[nemo_curator.tasks.Task]
) -> ray.data.Dataset

Convert list of tasks to Ray Data dataset.

Parameters:

tasks
list[Task]

List of Task objects

Returns: Dataset

Ray Data dataset containing Task objects directly

nemo_curator.backends.ray_data.executor.RayDataExecutor.execute(
stages: list[nemo_curator.stages.base.ProcessingStage],
initial_tasks: list[nemo_curator.tasks.Task] | None = None
) -> list[nemo_curator.tasks.Task]

Execute the pipeline stages using Ray Data.

Parameters:

stages
list[ProcessingStage]

List of processing stages to execute

initial_tasks
list[Task]Defaults to None

Initial tasks to process (can be None for empty start)

Returns: list[Task]

list[Task]: List of final processed tasks