nemo_automodel.components.datasets.llm.mock_packed#
Module Contents#
Functions#
Build a trivial vocab; index 0= |
|
Sentence generator with Gaussian length control. |
|
Flush helper (build position_ids that reset after |
|
Dataset builder. |
API#
- nemo_automodel.components.datasets.llm.mock_packed.make_vocab(vocab_size: int = 100)#
Build a trivial vocab; index 0=
, 1= , rest = word_i.
- nemo_automodel.components.datasets.llm.mock_packed.gen_sentence_ids(vocab, mean_len: float, std_len: float, max_len: int)#
Sentence generator with Gaussian length control.
- nemo_automodel.components.datasets.llm.mock_packed.flush_block(block, block_size)#
Flush helper (build position_ids that reset after
).
- nemo_automodel.components.datasets.llm.mock_packed.build_packed_dataset(
- *,
- num_blocks: int = 10,
- block_size: int = 128,
- mean_len: float = 20.0,
- std_len: float = 6.0,
- vocab_size: int = 100,
- max_sentence_len: int = 64,
- seed: int = 0,
- tokenizer=None,
Dataset builder.