nemo_curator.backends.ray_data.adapter
nemo_curator.backends.ray_data.adapter
nemo_curator.backends.ray_data.adapter
Bases: BaseStageAdapter
Adapts ProcessingStage to Ray Data operations.
This adapter converts stages to work with Ray Data datasets by:
Get the batch size for this stage.
Internal method that handles the actual batch processing logic.
Parameters:
Dictionary with arrays/lists representing a batch of Task objects
Returns: dict[str, Any]
Dictionary with arrays/lists representing processed Task objects
Process a Ray Data dataset through this stage.
Parameters:
Ray Data dataset containing Task objects
Returns: Dataset
Processed Ray Data dataset
Create a StageProcessor class with the proper stage name for display.
Create a named Ray Data stage adapter function.
This creates a standalone function that wraps the stage processing logic with a clean name that doesn’t include the class qualification.
Parameters:
Processing stage to adapt
Returns: Callable[[dict[str, Any]], dict[str, Any]]
A function that can be used directly with Ray Data’s map_batches