backends.experimental.ray_actor_pool.executor#

Module Contents#

Classes#

RayActorPoolExecutor

Ray-based executor using ActorPool for better resource management.

API#

class backends.experimental.ray_actor_pool.executor.RayActorPoolExecutor(config: dict | None = None)#

Bases: nemo_curator.backends.base.BaseExecutor

Ray-based executor using ActorPool for better resource management.

This executor:

  1. Creates a pool of actors per stage using Ray’s ActorPool

  2. Uses map_unordered for better load balancing and fault tolerance

  3. Lets Ray handle object ownership and garbage collection automatically

  4. Provides better backpressure management through ActorPool

Initialization

execute(
stages: list[nemo_curator.stages.base.ProcessingStage],
initial_tasks: list[nemo_curator.tasks.Task] | None = None,
) list[nemo_curator.tasks.Task]#

Execute the pipeline stages using ActorPool.

Args: stages: List of processing stages to execute initial_tasks: Initial tasks to process (can be None for empty start)

Returns: List of final processed tasks