ProcessingStage
The ProcessingStage class is the base class for all data processing stages in NeMo Curator. Each stage defines a single step in a data curation pipeline.
Import
Class Definition
Abstract Methods
inputs()
Define stage input requirements.
outputs()
Define stage output requirements.
process()
Process a single task.
Optional Lifecycle Methods
setup_on_node()
Node-level initialization (e.g., download models).
setup()
Worker-level initialization (e.g., load models).
teardown()
Cleanup after processing.
process_batch()
Vectorized batch processing for better performance.
Creating Custom Stages
Configuration with with_()
Stages can be configured using the with_() method: