nemo_automodel.components.datasets.llm.mock_seq_cls

View as Markdown

Module Contents

Classes

NameDescription
MockSequenceClassificationDatasetMock dataset for sequence classification functional tests.

API

class nemo_automodel.components.datasets.llm.mock_seq_cls.MockSequenceClassificationDataset(
num_samples: int = 64,
num_labels: int = 2,
vocab_size: int = 256,
max_seq_len: int = 32,
seed: int = 0,
tokenizer = None
)

Bases: Dataset

Mock dataset for sequence classification functional tests.

Generates random token sequences with binary labels. Does not require a tokenizer or network access.

samples
= []
nemo_automodel.components.datasets.llm.mock_seq_cls.MockSequenceClassificationDataset.__getitem__(
idx
)
nemo_automodel.components.datasets.llm.mock_seq_cls.MockSequenceClassificationDataset.__len__()