nemo_automodel.components.datasets.llm.mock
nemo_automodel.components.datasets.llm.mock
Module Contents
Functions
Data
API
Build a dataset where each example is one sentence (variable length).
Returns:
- a HuggingFace Dataset with fields: input_ids: Sequence(int64) attention_mask:Sequence(int8) labels: Sequence(int64) position_ids: Sequence(int64)
Sentence generator with Gaussian length control.
Build a trivial vocab; index 0=<pad>, 1=<eos>, rest = tok_i.