modules.add_id#

Module Contents#

Classes#

AddId

Base class for all NeMo Curator modules.

API#

class modules.add_id.AddId(
id_field: str,
id_prefix: str = 'doc_id',
start_index: int | None = None,
)#

Bases: nemo_curator.modules.base.BaseModule

Base class for all NeMo Curator modules.

Handles validating that data lives on the correct device for each module

Initialization

Constructs a Module

Args: input_backend (Literal[“pandas”, “cudf”, “any”]): The backend the input dataframe must be on for the module to work name (str, Optional): The name of the module. If None, defaults to self.class.name

call(
dataset: nemo_curator.datasets.DocumentDataset,
) nemo_curator.datasets.DocumentDataset#

Performs an arbitrary operation on a dataset

Args: dataset (DocumentDataset): The dataset to operate on