nemo_curator.stages.text.classifiers.utils

View as Markdown

Module Contents

Classes

NameDescription
SortByLengthStageStage that sorts the input data by the length of the input tokens.

Data

DEBERTA_TOKENIZER_PADDING_SIDE

API

class nemo_curator.stages.text.classifiers.utils.SortByLengthStage()

Bases: ProcessingStage[DocumentBatch, DocumentBatch]

Stage that sorts the input data by the length of the input tokens.

name
= 'sort_by_length_stage'
nemo_curator.stages.text.classifiers.utils.SortByLengthStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.classifiers.utils.SortByLengthStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.classifiers.utils.SortByLengthStage.process(
batch: nemo_curator.tasks.DocumentBatch
) -> nemo_curator.tasks.DocumentBatch
nemo_curator.stages.text.classifiers.utils.DEBERTA_TOKENIZER_PADDING_SIDE = 'right'