> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.mock_packed

## Module Contents

### Functions

| Name                                                                                               | Description                                                      |
| -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| [`build_packed_dataset`](#nemo_automodel-components-datasets-llm-mock_packed-build_packed_dataset) | Dataset builder.                                                 |
| [`flush_block`](#nemo_automodel-components-datasets-llm-mock_packed-flush_block)                   | Flush helper (build position\_ids that reset after \<eos>).      |
| [`gen_sentence_ids`](#nemo_automodel-components-datasets-llm-mock_packed-gen_sentence_ids)         | Sentence generator with Gaussian length control.                 |
| [`make_vocab`](#nemo_automodel-components-datasets-llm-mock_packed-make_vocab)                     | Build a trivial vocab; index 0=\<pad>, 1=\<eos>, rest = word\_i. |

### Data

[`ds`](#nemo_automodel-components-datasets-llm-mock_packed-ds)

### API

```python
nemo_automodel.components.datasets.llm.mock_packed.build_packed_dataset(
    num_blocks: int = 10,
    block_size: int = 128,
    mean_len: float = 20.0,
    std_len: float = 6.0,
    vocab_size: int = 100,
    max_sentence_len: int = 64,
    seed: int = 0,
    tokenizer = None
)
```

Dataset builder.

```python
nemo_automodel.components.datasets.llm.mock_packed.flush_block(
    block,
    block_size
)
```

Flush helper (build position\_ids that reset after \<eos>).

```python
nemo_automodel.components.datasets.llm.mock_packed.gen_sentence_ids(
    vocab,
    mean_len: float,
    std_len: float,
    max_len: int
)
```

Sentence generator with Gaussian length control.

```python
nemo_automodel.components.datasets.llm.mock_packed.make_vocab(
    vocab_size: int = 100
)
```

Build a trivial vocab; index 0=\<pad>, 1=\<eos>, rest = word\_i.

```python
nemo_automodel.components.datasets.llm.mock_packed.ds = build_packed_dataset(num_blocks=3, block_size=32, mean_len=10, std_len=3, vocab_...
```