nemo_automodel.components.datasets.llm.mock_seq_cls#

Module Contents#

Classes#

MockSequenceClassificationDataset

Mock dataset for sequence classification functional tests.

API#

class nemo_automodel.components.datasets.llm.mock_seq_cls.MockSequenceClassificationDataset(
*,
num_samples: int = 64,
num_labels: int = 2,
vocab_size: int = 256,
max_seq_len: int = 32,
seed: int = 0,
tokenizer=None,
)#

Bases: torch.utils.data.Dataset

Mock dataset for sequence classification functional tests.

Generates random token sequences with binary labels. Does not require a tokenizer or network access.

Initialization

__len__()#
__getitem__(idx)#