nemo_automodel.components.datasets.llm.seq_cls#

Module Contents#

Classes#

GLUE_MRPC

GLUE MRPC dataset (sentence pair classification).

API#

class nemo_automodel.components.datasets.llm.seq_cls.GLUE_MRPC(
tokenizer,
*,
split: str = 'train',
num_samples_limit: Optional[int] = None,
trust_remote_code: bool = True,
max_length: Optional[int] = 256,
)#

GLUE MRPC dataset (sentence pair classification).

Produces tokenized inputs with both sentence1 and sentence2 using the provided tokenizer.

Initialization

__len__()#
__getitem__(idx)#