nemo_curator.backends.base

Module Contents

Classes

Name	Description
`BaseExecutor`	Executor for a pipeline.
`BaseStageAdapter`	Adapts ProcessingStage to an execution backend, if needed.
`NodeInfo`	Generic node information for setup_on_node calls across backends.
`WorkerMetadata`	Generic worker metadata for setup_on_node calls across backends.

API

class nemo_curator.backends.base.BaseExecutor(
    config: dict[str, typing.Any] | None = None,
    ignore_head_node: bool = False
)

Abstract

Executor for a pipeline.

config

= config or {}

ignore_head_node

= ignore_head_node or ignore_ray_head_node()

nemo_curator.backends.base.BaseExecutor.execute(
    stages: list[nemo_curator.stages.base.ProcessingStage],
    initial_tasks: list[nemo_curator.tasks.Task] | None = None
) -> None

abstract

Execute the pipeline.

class nemo_curator.backends.base.BaseStageAdapter(
    stage: nemo_curator.stages.base.ProcessingStage
)

Adapts ProcessingStage to an execution backend, if needed.

nemo_curator.backends.base.BaseStageAdapter.process_batch(
    tasks: list[nemo_curator.tasks.Task]
) -> list[nemo_curator.tasks.Task]

Process a batch of tasks.

Parameters:

tasks

list[Task]

List of tasks to process

Returns: list[Task]

list[Task]: List of processed tasks

nemo_curator.backends.base.BaseStageAdapter.setup(
    worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Setup the stage once per actor.

Parameters:

worker_metadata

WorkerMetadataDefaults to None

Information about the worker

nemo_curator.backends.base.BaseStageAdapter.setup_on_node(
    node_info: nemo_curator.backends.base.NodeInfo | None = None,
    worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Setup the stage on a node.

Parameters:

node_info

NodeInfoDefaults to None

Information about the node

worker_metadata

WorkerMetadataDefaults to None

Information about the worker

nemo_curator.backends.base.BaseStageAdapter.teardown() -> None

Teardown the stage once per actor.

class nemo_curator.backends.base.NodeInfo(
    node_id: str = ''
)

Dataclass

Generic node information for setup_on_node calls across backends. Simplified to match Xenna’s structure.

node_id

str = ''

class nemo_curator.backends.base.WorkerMetadata(
    worker_id: str = '',
    allocation: typing.Any = None
)

Dataclass

Generic worker metadata for setup_on_node calls across backends. Simplified to match Xenna’s structure. The allocation field can contain backend-specific allocation information.

worker_id

str = ''

Module Contents

Classes

Name	Description
`BaseExecutor`	Executor for a pipeline.
`BaseStageAdapter`	Adapts ProcessingStage to an execution backend, if needed.
`NodeInfo`	Generic node information for setup_on_node calls across backends.
`WorkerMetadata`	Generic worker metadata for setup_on_node calls across backends.

API

class nemo_curator.backends.base.BaseExecutor(
    config: dict[str, typing.Any] | None = None,
    ignore_head_node: bool = False
)

Abstract

Executor for a pipeline.

config

= config or {}

ignore_head_node

= ignore_head_node or ignore_ray_head_node()

nemo_curator.backends.base.BaseExecutor.execute(
    stages: list[nemo_curator.stages.base.ProcessingStage],
    initial_tasks: list[nemo_curator.tasks.Task] | None = None
) -> None

abstract

Execute the pipeline.

class nemo_curator.backends.base.BaseStageAdapter(
    stage: nemo_curator.stages.base.ProcessingStage
)

Adapts ProcessingStage to an execution backend, if needed.

nemo_curator.backends.base.BaseStageAdapter.process_batch(
    tasks: list[nemo_curator.tasks.Task]
) -> list[nemo_curator.tasks.Task]

Process a batch of tasks.

Parameters:

tasks

list[Task]

List of tasks to process

Returns: list[Task]

list[Task]: List of processed tasks

nemo_curator.backends.base.BaseStageAdapter.setup(
    worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Setup the stage once per actor.

Parameters:

worker_metadata

WorkerMetadataDefaults to None

Information about the worker

nemo_curator.backends.base.BaseStageAdapter.setup_on_node(
    node_info: nemo_curator.backends.base.NodeInfo | None = None,
    worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Setup the stage on a node.

Parameters:

node_info

NodeInfoDefaults to None

Information about the node

worker_metadata

WorkerMetadataDefaults to None

Information about the worker

nemo_curator.backends.base.BaseStageAdapter.teardown() -> None

Teardown the stage once per actor.

class nemo_curator.backends.base.NodeInfo(
    node_id: str = ''
)

Dataclass

Generic node information for setup_on_node calls across backends. Simplified to match Xenna’s structure.

node_id

str = ''

class nemo_curator.backends.base.WorkerMetadata(
    worker_id: str = '',
    allocation: typing.Any = None
)

Dataclass

Generic worker metadata for setup_on_node calls across backends. Simplified to match Xenna’s structure. The allocation field can contain backend-specific allocation information.

worker_id

str = ''