nemo_automodel.datasets.llm.mock
#
Module Contents#
Functions#
Build a trivial vocab; index 0= |
|
Sentence generator with Gaussian length control. |
|
Build a dataset where each example is one sentence (variable length). |
API#
- nemo_automodel.datasets.llm.mock.make_vocab(vocab_size: int = 100)[source]#
Build a trivial vocab; index 0=
, 1= , rest = tok_i.
- nemo_automodel.datasets.llm.mock.gen_sentence_ids(vocab, mean_len: float, std_len: float, max_len: int)[source]#
Sentence generator with Gaussian length control.
- nemo_automodel.datasets.llm.mock.build_unpacked_dataset(
- *,
- num_sentences: int = 10,
- mean_len: float = 20.0,
- std_len: float = 6.0,
- vocab_size: int = 100,
- max_sentence_len: int = 64,
- seed: int = 0,
Build a dataset where each example is one sentence (variable length).
- Returns:
input_ids: Sequence(int64) attention_mask:Sequence(int8) labels: Sequence(int64) position_ids: Sequence(int64)
- Return type:
a HuggingFace Dataset with fields