> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.eagle3

Data helpers for minimal EAGLE-3 training.

## Module Contents

### Functions

| Name                                                                                                                      | Description                                                                     |
| ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| [`_broadcast_cached_ids`](#nemo_automodel-components-datasets-llm-eagle3-_broadcast_cached_ids)                           | Rank 0 loads (and validates) the cached ids; broadcast the result to all ranks. |
| [`_expected_draft_vocab_size`](#nemo_automodel-components-datasets-llm-eagle3-_expected_draft_vocab_size)                 | Return how many ids `build_eagle3_token_mapping` yields for this config.        |
| [`_pack_collate`](#nemo_automodel-components-datasets-llm-eagle3-_pack_collate)                                           | Collate packed rows; ragged `seq_lens` is 0-padded to `[B, max_docs]`.          |
| [`_stack_batch`](#nemo_automodel-components-datasets-llm-eagle3-_stack_batch)                                             | Stack a batch of pre-padded unshifted chat samples.                             |
| [`build_eagle3_dataloader`](#nemo_automodel-components-datasets-llm-eagle3-build_eagle3_dataloader)                       | Build a dataloader backed by the repo's chat formatting utilities.              |
| [`build_eagle3_token_mapping`](#nemo_automodel-components-datasets-llm-eagle3-build_eagle3_token_mapping)                 | Build draft-vocab mapping tensors from supervised token frequency.              |
| [`build_packed_eagle3_dataset`](#nemo_automodel-components-datasets-llm-eagle3-build_packed_eagle3_dataset)               | Greedily pack variable-length chat samples into rows of `packed_sequence_size`. |
| [`load_eagle3_token_mapping`](#nemo_automodel-components-datasets-llm-eagle3-load_eagle3_token_mapping)                   | Load a cached draft-vocab mapping, or `None` if absent / incompatible.          |
| [`load_or_build_eagle3_token_mapping`](#nemo_automodel-components-datasets-llm-eagle3-load_or_build_eagle3_token_mapping) | Build the draft-vocab mapping, reusing a cached copy at `cache_path`.           |
| [`save_eagle3_token_mapping`](#nemo_automodel-components-datasets-llm-eagle3-save_eagle3_token_mapping)                   | Persist the draft-vocab selection so future runs skip the frequency scan.       |

### Data

[`logger`](#nemo_automodel-components-datasets-llm-eagle3-logger)

### API

```python
nemo_automodel.components.datasets.llm.eagle3._broadcast_cached_ids(
    cache_path: str,
    target_vocab_size: int,
    draft_vocab_size: int | None
) -> torch.Tensor | None
```

Rank 0 loads (and validates) the cached ids; broadcast the result to all ranks.

Only rank 0 touches the filesystem, so the load-vs-build decision is identical
on every rank even when `cache_path` lives on a node-local (non-shared)
filesystem. This matters because `build_eagle3_token_mapping` issues a
collective `all_reduce`: if some ranks loaded a cache while others rebuilt,
that collective would mismatch and hang. Returns the ids (cpu, long) or
`None` (rebuild on every rank).

```python
nemo_automodel.components.datasets.llm.eagle3._expected_draft_vocab_size(
    target_vocab_size: int,
    draft_vocab_size: int | None
) -> int
```

Return how many ids `build_eagle3_token_mapping` yields for this config.

Mirrors its selection branch: a `None` or too-large `draft_vocab_size`
falls back to the full target vocab.

```python
nemo_automodel.components.datasets.llm.eagle3._pack_collate(
    features: list[dict[str, typing.Any]]
) -> dict[str, torch.Tensor]
```

Collate packed rows; ragged `seq_lens` is 0-padded to `[B, max_docs]`.

```python
nemo_automodel.components.datasets.llm.eagle3._stack_batch(
    features: list[dict[str, typing.Any]]
) -> dict[str, torch.Tensor]
```

Stack a batch of pre-padded unshifted chat samples.

```python
nemo_automodel.components.datasets.llm.eagle3.build_eagle3_dataloader(
    data_path: str,
    tokenizer,
    seq_length: int,
    batch_size: int,
    shuffle: bool,
    num_workers: int = 0,
    split: str | None = None,
    distributed: bool = False,
    shuffle_seed: int | None = 42,
    mask_reasoning_content: bool = False,
    packed_sequence_size: int = 0,
    dp_mesh = None
) -> torch.utils.data.DataLoader
```

Build a dataloader backed by the repo's chat formatting utilities.

`packed_sequence_size &gt; 0` (EAGLE-3 only) enables sequence packing (see
:func:`build_packed_eagle3_dataset`), removing the padding waste of the
default `padding="max_length"` path; `== 0` keeps the original behavior.

`dp_mesh` (the "dp" device submesh) is required for context parallelism: the
sampler then distributes by data-parallel rank so the `cp_size` ranks within
a dp group receive the identical sample (CP shards its sequence across them).
When `None` the sampler falls back to the full-world default (pure DP).

```python
nemo_automodel.components.datasets.llm.eagle3.build_eagle3_token_mapping(
    dataloader: torch.utils.data.DataLoader,
    target_vocab_size: int,
    draft_vocab_size: int | None,
    special_token_ids: list[int] | None = None
) -> tuple[torch.Tensor, torch.Tensor]
```

Build draft-vocab mapping tensors from supervised token frequency.

Counts are accumulated as a dense `[target_vocab_size]` tensor and
`all_reduce` summed across ranks when `torch.distributed` is
initialized, so every rank ends up with the same draft vocabulary.

**Returns:** `torch.Tensor`

Tuple `(selected_token_ids, selected_token_mask)` where:

```python
nemo_automodel.components.datasets.llm.eagle3.build_packed_eagle3_dataset(
    source_dataset,
    packed_sequence_size: int,
    pad_token_id: int
) -> list[dict[str, list[int]]]
```

Greedily pack variable-length chat samples into rows of `packed_sequence_size`.

Each source sample is one *document*; documents are concatenated into a
fixed-width row with `position_ids` reset per document and trailing pad
folded into the final document (so `seq_lens` sums to the row width).

Cross-document leakage at TTT boundaries is handled by `doc_remaining[t]`
(real tokens after slot `t` within its document): the trainer supervises
slot `t` at step `k` to predict `k+1` ahead, valid iff
`k &lt; doc_remaining[t]`. This masks every cross-document / into-padding
supervision -- packing creates many such boundaries per row.

Returns a list of packed-row dicts with keys `input_ids`, `loss_mask`,
`attention_mask`, `position_ids`, `doc_remaining` (length
`packed_sequence_size`) and `seq_lens` (per-document padded lengths).

```python
nemo_automodel.components.datasets.llm.eagle3.load_eagle3_token_mapping(
    path: str,
    target_vocab_size: int,
    draft_vocab_size: int | None
) -> tuple[torch.Tensor, torch.Tensor] | None
```

Load a cached draft-vocab mapping, or `None` if absent / incompatible.

The cache is keyed only on `target_vocab_size` and the resulting draft
vocab size -- it does NOT fingerprint the dataset or tokenizer. A cache built
from a different dataset still loads cleanly, so a caller that changes the
training data must point `selected_token_ids_path` at a fresh location (or
delete the file). Returns `None` -- so the caller rebuilds -- when the file
is missing, unreadable, or its stored vocab sizes do not match the config.

```python
nemo_automodel.components.datasets.llm.eagle3.load_or_build_eagle3_token_mapping(
    dataloader: torch.utils.data.DataLoader,
    target_vocab_size: int,
    draft_vocab_size: int | None,
    special_token_ids: list[int] | None = None,
    cache_path: str | None = None
) -> tuple[torch.Tensor, torch.Tensor]
```

Build the draft-vocab mapping, reusing a cached copy at `cache_path`.

With `cache_path` set, present, and compatible, loads the mapping and skips
the full-dataset frequency scan `build_eagle3_token_mapping` performs.
Otherwise builds the mapping and -- on rank 0 -- writes it to `cache_path`
for next time. With `cache_path=None` this is exactly
`build_eagle3_token_mapping`.

```python
nemo_automodel.components.datasets.llm.eagle3.save_eagle3_token_mapping(
    path: str,
    selected_token_ids: torch.Tensor,
    target_vocab_size: int
) -> None
```

Persist the draft-vocab selection so future runs skip the frequency scan.

Written atomically (`.tmp` + `os.replace`) so a crash mid-write never
leaves a half-written file a later run would load. Only `selected_token_ids`
is stored -- `selected_token_mask` is fully derivable from it plus
`target_vocab_size`.

```python
nemo_automodel.components.datasets.llm.eagle3.logger = logging.getLogger(__name__)
```