stages.function_decorators#
Example#
from nemo_curator.stages.resources import Resources
from nemo_curator.stages.function_decorators import processing_stage
@processing_stage(name="WordCountStage", resources=Resources(cpus=1.0), batch_size=1)
def word_count(task: SampleTask) -> SampleTask:
# Add a *word_count* column to the task's DataFrame
task.data["word_count"] = task.data["sentence"].str.split().str.len()
return task
The variable word_count now holds an instance of a concrete
ProcessingStage subclass that can be added directly to a
- class:
nemo_curator.pipeline.Pipelinelike so:
from nemo_curator.pipeline import Pipeline
pipeline = Pipeline(...)
# Add read stage, etc.
pipeline.add_stage(...)
# Add ``WordCountStage``
pipeline.add_stage(word_count)
result = pipeline.run(...)
Utility decorators for creating ProcessingStage instances from simple functions.
This module provides a :func:processing_stage decorator that turns a plain
Python function into a concrete :class:nemo_curator.stages.base.ProcessingStage.
Module Contents#
Functions#
Decorator that converts a function into a :class: |
Data#
API#
- stages.function_decorators.TIn#
‘TypeVar(…)’
- stages.function_decorators.TOut#
‘TypeVar(…)’
- stages.function_decorators.processing_stage(
- *,
- name: str,
- resources: nemo_curator.stages.resources.Resources | dict[str, float] | None = None,
- batch_size: int | None = None,
Decorator that converts a function into a :class:
ProcessingStage.Parameters
name: The name assigned to the resulting stage (
ProcessingStage.name). resources: Optional :class:nemo_curator.stages.resources.Resourcesor dict[str, float] describing the required compute resources. If None a default ofResources()is used. batch_size: Optional batch size for the stage.Nonemeans no explicit batch size (executor decides).The decorated function must:
Accept exactly one positional argument: a :class:
Taskinstance (or subclass).Return either a single :class:
Taskinstance or alistof tasks.