modules.add_id
#
Module Contents#
Classes#
Base class for all NeMo Curator modules. |
API#
- class modules.add_id.AddId(
- id_field: str,
- id_prefix: str = 'doc_id',
- start_index: int | None = None,
Bases:
nemo_curator.modules.base.BaseModule
Base class for all NeMo Curator modules.
Handles validating that data lives on the correct device for each module
Initialization
Constructs a Module
Args: input_backend (Literal[“pandas”, “cudf”, “any”]): The backend the input dataframe must be on for the module to work name (str, Optional): The name of the module. If None, defaults to self.class.name
- call(
- dataset: nemo_curator.datasets.DocumentDataset,
Performs an arbitrary operation on a dataset
Args: dataset (DocumentDataset): The dataset to operate on