nemo_curator.pipeline.pipeline

View as Markdown

Module Contents

Classes

NameDescription
PipelineUser-facing pipeline definition for composing processing stages.

API

class nemo_curator.pipeline.pipeline.Pipeline(
name: str,
description: str | None = None,
stages: list[nemo_curator.stages.base.ProcessingStage] | None = None,
config: dict[str, typing.Any] | None = None
)

User-facing pipeline definition for composing processing stages.

config
= config or {}
stages
list[ProcessingStage] = stages or []
nemo_curator.pipeline.pipeline.Pipeline.__repr__() -> str

String representation of the pipeline.

nemo_curator.pipeline.pipeline.Pipeline._decompose_stages(
stages: list[nemo_curator.stages.base.ProcessingStage | nemo_curator.stages.base.CompositeStage] stages: list[nemo_curator.stages.base.ProcessingStage | nemo_curator.stages.base.CompositeStage]
) -> tuple[list[nemo_curator.stages.base.ProcessingStage], dict[str, list[str]]]

Decompose composite stages into execution stages.

Parameters:

stages
list[ProcessingStage | CompositeStage]

List of stages that may include composite stages

Returns: tuple[list[ProcessingStage], dict[str, list[str]]]

tuple[list[ProcessingStage], dict[str, list[str]]]: Tuple of (execution stages, decomposition info dict)

Raises:

  • TypeError: If a composite stage is decomposed into another composite stage

Add a stage to the pipeline.

Parameters:

stage
ProcessingStage

Processing stage to add

Returns: Pipeline

Self (Pipeline) for method chaining

nemo_curator.pipeline.pipeline.Pipeline.build() -> None

Build an execution plan from the pipeline.

Raises:

  • ValueError: If the pipeline has no stages
nemo_curator.pipeline.pipeline.Pipeline.describe() -> str

Get a detailed description of the pipeline stages and their requirements.

nemo_curator.pipeline.pipeline.Pipeline.run(
executor: nemo_curator.backends.base.BaseExecutor | None = None,
initial_tasks: list[nemo_curator.tasks.Task] | None = None
) -> list[nemo_curator.tasks.Task] | None

Run the pipeline.

Parameters:

executor
BaseExecutorDefaults to None

Executor to use

initial_tasks
list[Task]Defaults to None

Initial tasks to start the pipeline with. Defaults to None.

Returns: list[Task] | None

list[Task] | None: List of tasks