> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.mock_prefix_tree

Deterministic mock shared-prefix rollout data for prefix-tree smoke runs.

## Module Contents

### Functions

| Name                                                                                                                | Description                                                              |
| ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`build_mock_rollout_dataset`](#nemo_automodel-components-datasets-llm-mock_prefix_tree-build_mock_rollout_dataset) | Build a deterministic mock shared-prefix rollout dataset for smoke runs. |

### API

```python
nemo_automodel.components.datasets.llm.mock_prefix_tree.build_mock_rollout_dataset(
    num_groups: int = 16,
    completions_per_group: int = 4,
    prompt_len: int = 32,
    completion_len: int = 16,
    vocab_size: int = 1024,
    seed: int = 0
) -> list[dict]
```

Build a deterministic mock shared-prefix rollout dataset for smoke runs.

Each group is one shared prompt with `completions_per_group` completions, in
the `&#123;"prompt_ids", "completions"&#125;` schema consumed by
`prefix_tree_collate_fn`. Token ids are random in `[2, vocab_size)`; this
is a pipeline smoke, not a quality dataset.

**Parameters:**

number of rollout groups.

completions (leaves) sharing each prompt.

shared prompt length per group.

length of each completion.

upper bound (exclusive) for random token ids.

RNG seed for reproducibility.

**Returns:** `list[dict]`

A list of `&#123;"prompt_ids": list[int], "completions": list[list[int]]&#125;`.