nemo_curator.backends.base

View as Markdown

Module Contents

Classes

NameDescription
BaseExecutorExecutor for a pipeline.
BaseStageAdapterAdapts ProcessingStage to an execution backend, if needed.
NodeInfoGeneric node information for setup_on_node calls across backends.
WorkerMetadataGeneric worker metadata for setup_on_node calls across backends.

API

class nemo_curator.backends.base.BaseExecutor(
config: dict[str, typing.Any] | None = None,
ignore_head_node: bool = False
)
Abstract

Executor for a pipeline.

config
= config or {}
nemo_curator.backends.base.BaseExecutor.execute(
stages: list[nemo_curator.stages.base.ProcessingStage],
initial_tasks: list[nemo_curator.tasks.Task] | None = None
) -> None
abstract

Execute the pipeline.

class nemo_curator.backends.base.BaseStageAdapter(
stage: nemo_curator.stages.base.ProcessingStage
)

Adapts ProcessingStage to an execution backend, if needed.

nemo_curator.backends.base.BaseStageAdapter.process_batch(
tasks: list[nemo_curator.tasks.Task]
) -> list[nemo_curator.tasks.Task]

Process a batch of tasks.

Parameters:

tasks
list[Task]

List of tasks to process

Returns: list[Task]

list[Task]: List of processed tasks

nemo_curator.backends.base.BaseStageAdapter.setup(
worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Setup the stage once per actor.

Parameters:

worker_metadata
WorkerMetadataDefaults to None

Information about the worker

nemo_curator.backends.base.BaseStageAdapter.setup_on_node(
node_info: nemo_curator.backends.base.NodeInfo | None = None,
worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Setup the stage on a node.

Parameters:

node_info
NodeInfoDefaults to None

Information about the node

worker_metadata
WorkerMetadataDefaults to None

Information about the worker

nemo_curator.backends.base.BaseStageAdapter.teardown() -> None

Teardown the stage once per actor.

class nemo_curator.backends.base.NodeInfo(
node_id: str = ''
)
Dataclass

Generic node information for setup_on_node calls across backends. Simplified to match Xenna’s structure.

node_id
str = ''
class nemo_curator.backends.base.WorkerMetadata(
worker_id: str = '',
allocation: typing.Any = None
)
Dataclass

Generic worker metadata for setup_on_node calls across backends. Simplified to match Xenna’s structure. The allocation field can contain backend-specific allocation information.

worker_id
str = ''