nemo_curator.backends.experimental.ray_actor_pool.executor
nemo_curator.backends.experimental.ray_actor_pool.executor
Module Contents
Classes
Functions
Data
API
Bases: BaseExecutor
Ray-based executor using ActorPool for better resource management.
This executor:
- Creates a pool of actors per stage using Ray’s ActorPool
- Uses map_unordered for better load balancing and fault tolerance
- Lets Ray handle object ownership and garbage collection automatically
- Provides better backpressure management through ActorPool
Clean up actors in the pool.
Clean up a list of actors.
Create an ActorPool for a specific stage.
Create a RAFT ActorPool for a specific stage.
Create a RapidsMPFShuffling Actors and setup UCXX communication for a specific stage.
Execute an LSH stage with band iteration.
Parameters:
The LSH stage to execute
Input tasks to process
Returns: list[Task]
List of output tasks from all band iterations
Generate task batches from a list of tasks. Args: tasks: List of Task objects to process batch_size: The size of the batch num_output_tasks: The number of output tasks to generate. Either batch_size or num_output_tasks must be provided but not both. Returns: List of task batches
Process Shuffle through the actors. Args: actors: The actors to use for processing tasks: List of Task objects to process band_range: Band range for LSH shuffle Returns: List of processed Task objects
Process tasks through the actor pool.
Parameters:
The ActorPool to use for processing
The processing stage (for logging/context, unused)
List of Task objects to process
Returns: list[Task]
List of processed Task objects
Execute the pipeline stages using ActorPool.
Parameters:
List of processing stages to execute
Initial tasks to process (can be None for empty start)
Returns: list[Task]
List of final processed tasks