nemo_automodel.components.datasets.llm.seq_cls

View as Markdown

Module Contents

Classes

NameDescription
GLUE_MRPCGLUE MRPC dataset (sentence pair classification).

API

class nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC(
tokenizer,
split: str = 'train',
num_samples_limit: typing.Optional[int] = None,
trust_remote_code: bool = True,
max_length: typing.Optional[int] = 256
)

GLUE MRPC dataset (sentence pair classification).

Produces tokenized inputs with both sentence1 and sentence2 using the provided tokenizer.

dataset
nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC.__getitem__(
idx
)
nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC.__len__()