> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.mock

## Module Contents

### Functions

| Name                                                                                            | Description                                                           |
| ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`build_unpacked_dataset`](#nemo_automodel-components-datasets-llm-mock-build_unpacked_dataset) | Build a dataset where each example is one sentence (variable length). |
| [`gen_sentence_ids`](#nemo_automodel-components-datasets-llm-mock-gen_sentence_ids)             | Sentence generator with Gaussian length control.                      |
| [`make_vocab`](#nemo_automodel-components-datasets-llm-mock-make_vocab)                         | Build a trivial vocab; index 0=\<pad>, 1=\<eos>, rest = tok\_i.       |

### Data

[`ds`](#nemo_automodel-components-datasets-llm-mock-ds)

### API

```python
nemo_automodel.components.datasets.llm.mock.build_unpacked_dataset(
    num_sentences: int = 10,
    mean_len: float = 20.0,
    std_len: float = 6.0,
    vocab_size: int = 100,
    max_sentence_len: int = 64,
    seed: int = 0,
    tokenizer = None
)
```

Build a dataset where each example is one sentence (variable length).

**Returns:**

* a HuggingFace Dataset with fields:
  input\_ids:     Sequence(int64)
  attention\_mask:Sequence(int8)
  labels:        Sequence(int64)
  position\_ids:  Sequence(int64)

```python
nemo_automodel.components.datasets.llm.mock.gen_sentence_ids(
    vocab,
    mean_len: float,
    std_len: float,
    max_len: int
)
```

Sentence generator with Gaussian length control.

```python
nemo_automodel.components.datasets.llm.mock.make_vocab(
    vocab_size: int = 100
)
```

Build a trivial vocab; index 0=\<pad>, 1=\<eos>, rest = tok\_i.

```python
nemo_automodel.components.datasets.llm.mock.ds = build_unpacked_dataset(num_sentences=5, mean_len=12.0, std_len=3.0, vocab_size=5...
```