classifiers.base
#
Module Contents#
Classes#
Abstract class for running multi-node multi-GPU data classification |
|
API#
- class classifiers.base.DistributedDataClassifier(
- model: str,
- labels: list[str] | None,
- filter_by: list[str] | None,
- batch_size: int,
- out_dim: int | None,
- pred_column: str | list[str],
- max_chars: int,
- device_type: str,
- autocast: bool,
Bases:
nemo_curator.modules.base.BaseModule
Abstract class for running multi-node multi-GPU data classification
Initialization
Constructs a Module
Args: input_backend (Literal[“pandas”, “cudf”, “any”]): The backend the input dataframe must be on for the module to work name (str, Optional): The name of the module. If None, defaults to self.class.name
- call(
- dataset: nemo_curator.datasets.DocumentDataset,
Performs an arbitrary operation on a dataset
Args: dataset (DocumentDataset): The dataset to operate on
- get_labels() list[str] #