> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.regenerate

Regenerate dataset answers with the EAGLE target model.

EAGLE drafters learn best when the supervised assistant turn is produced by
the *same* model that will serve as the inference target. Many public chat
datasets were generated by other models, so the assistant tokens they contain
are off-distribution for the drafter. This script takes such a dataset,
strips the trailing assistant turn from each sample, replays the remaining
`[system, user, ...]` context against a target model running behind an
OpenAI-compatible SGLang server, and writes a new dataset whose
`messages` column ends with a freshly-generated assistant turn.

The output parquet files have the same `messages` column shape that
`ChatDataset` (used by `build_eagle3_dataloader`) consumes, so the
regenerated directory can be plugged directly into `train_data_path` in
the EAGLE-3 recipe.

Typical usage:

# 1. Spin up SGLang serving the target model in another shell:

python -m sglang.launch\_server \
\--model-path meta-llama/Llama-3.1-8B-Instruct --port 30000

# 2. Regenerate answers:

python -m nemo\_automodel.components.speculative.regenerate \
\--input-data Aeala/ShareGPT\_Vicuna\_unfiltered \
\--output-dir ./regenerated/sharegpt\_llama31\_8b \
\--target-server [http://localhost:30000/v1](http://localhost:30000/v1) \
\--model meta-llama/Llama-3.1-8B-Instruct \
\--concurrency 64 --shard-size 1000

The script is resumable: re-running with the same `--output-dir --resume`
skips any shards that are already on disk, and verifies via a manifest that
the input/model/sharding configuration matches the earlier run.

## Module Contents

### Classes

| Name                                                                                     | Description                                                           |
| ---------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`GenerationConfig`](#nemo_automodel-components-speculative-regenerate-GenerationConfig) | Sampling parameters forwarded to the SGLang chat completion endpoint. |

### Functions

| Name                                                                                                           | Description                                                                          |
| -------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| [`_build_manifest`](#nemo_automodel-components-speculative-regenerate-_build_manifest)                         | Return the regeneration settings that must stay stable across resume.                |
| [`_build_parser`](#nemo_automodel-components-speculative-regenerate-_build_parser)                             | -                                                                                    |
| [`_chat_completion`](#nemo_automodel-components-speculative-regenerate-_chat_completion)                       | POST `payload` to `url` and return the assistant message dict, with bounded retries. |
| [`_ensure_manifest_compatible`](#nemo_automodel-components-speculative-regenerate-_ensure_manifest_compatible) | Guard `--resume` against silently mixing shards from different runs.                 |
| [`_existing_shard_indices`](#nemo_automodel-components-speculative-regenerate-_existing_shard_indices)         | Return the set of shard indices already present in `output_dir`.                     |
| [`_extract_prompt_messages`](#nemo_automodel-components-speculative-regenerate-_extract_prompt_messages)       | Return `messages` truncated so its tail is *not* an assistant turn.                  |
| [`_import_aiohttp`](#nemo_automodel-components-speculative-regenerate-_import_aiohttp)                         | -                                                                                    |
| [`_import_pyarrow_table`](#nemo_automodel-components-speculative-regenerate-_import_pyarrow_table)             | -                                                                                    |
| [`_iter_samples`](#nemo_automodel-components-speculative-regenerate-_iter_samples)                             | Yield rows' `messages_column` from an HF dataset or a list of dicts.                 |
| [`_manifest_path`](#nemo_automodel-components-speculative-regenerate-_manifest_path)                           | Return the manifest path inside `output_dir`.                                        |
| [`_process_shard`](#nemo_automodel-components-speculative-regenerate-_process_shard)                           | Run a single shard's prompts through the target server with bounded concurrency.     |
| [`_regenerate_one`](#nemo_automodel-components-speculative-regenerate-_regenerate_one)                         | Call the target server once and return `prompt + [assistant]`.                       |
| [`_run`](#nemo_automodel-components-speculative-regenerate-_run)                                               | Async driver: load dataset, regenerate, write shards. Returns a process exit code.   |
| [`_validate_args`](#nemo_automodel-components-speculative-regenerate-_validate_args)                           | Reject invalid CLI values before any network or disk work starts.                    |
| [`_write_manifest`](#nemo_automodel-components-speculative-regenerate-_write_manifest)                         | Persist the current regeneration config for future `--resume` checks.                |
| [`_write_shard`](#nemo_automodel-components-speculative-regenerate-_write_shard)                               | Write a shard atomically (`.tmp` then `os.replace`) so partial writes never linger.  |
| [`main`](#nemo_automodel-components-speculative-regenerate-main)                                               | CLI entry point. Parses `argv` and returns the process exit code.                    |

### Data

[`_AIOHTTP_INSTALL_HINT`](#nemo_automodel-components-speculative-regenerate-_AIOHTTP_INSTALL_HINT)

[`_MANIFEST_NAME`](#nemo_automodel-components-speculative-regenerate-_MANIFEST_NAME)

[`_PYARROW_INSTALL_HINT`](#nemo_automodel-components-speculative-regenerate-_PYARROW_INSTALL_HINT)

[`_SHARD_NAME_RE`](#nemo_automodel-components-speculative-regenerate-_SHARD_NAME_RE)

[`logger`](#nemo_automodel-components-speculative-regenerate-logger)

### API

```python
class nemo_automodel.components.speculative.regenerate.GenerationConfig(
    model: str,
    max_new_tokens: int,
    temperature: float,
    top_p: float,
    reasoning: str = 'none'
)
```

Dataclass

Sampling parameters forwarded to the SGLang chat completion endpoint.

```python
nemo_automodel.components.speculative.regenerate._build_manifest(
    args: argparse.Namespace
) -> dict[str, typing.Any]
```

Return the regeneration settings that must stay stable across resume.

Fields that change the *content* of the output dataset are included. Fields
that only affect throughput / reliability (`concurrency`, `timeout_s`,
`max_retries`) are intentionally omitted so a user can re-resume with
different operational knobs. `output_dir` is also omitted: the manifest
lives inside `output_dir`, so encoding it here would only break resume
after a directory rename.

```python
nemo_automodel.components.speculative.regenerate._build_parser() -> argparse.ArgumentParser
```

```python
nemo_automodel.components.speculative.regenerate._chat_completion(
    session,
    url: str,
    payload: dict[str, typing.Any],
    timeout_s: float,
    max_retries: int
) -> dict[str, typing.Any]
```

async

POST `payload` to `url` and return the assistant message dict, with bounded retries.

```python
nemo_automodel.components.speculative.regenerate._ensure_manifest_compatible(
    output_dir: pathlib.Path,
    manifest: dict[str, typing.Any],
    resume: bool,
    existing_shards: set[int]
) -> None
```

Guard `--resume` against silently mixing shards from different runs.

Also refuses to start a fresh run that would silently clobber existing
shards: if the output directory already contains shard files and the user
did *not* pass `--resume`, raise so they make an explicit choice (either
delete the directory or pass `--resume`).

```python
nemo_automodel.components.speculative.regenerate._existing_shard_indices(
    output_dir: pathlib.Path
) -> set[int]
```

Return the set of shard indices already present in `output_dir`.

```python
nemo_automodel.components.speculative.regenerate._extract_prompt_messages(
    messages: list[dict[str, typing.Any]]
) -> list[dict[str, typing.Any]] | None
```

Return `messages` truncated so its tail is *not* an assistant turn.

EAGLE-3 supervision needs an assistant turn produced by the target model.
The strategy here mirrors SpecForge's offline regeneration: keep every
leading system / user / tool turn (including any intermediate
user\<->assistant rounds), but drop the trailing assistant turn so the
target can produce a fresh one.

Returns `None` if the sample has no valid prompt context (e.g. it is
empty, or starts with an assistant turn that gets dropped, leaving
nothing). Callers should skip such samples.

```python
nemo_automodel.components.speculative.regenerate._import_aiohttp()
```

```python
nemo_automodel.components.speculative.regenerate._import_pyarrow_table()
```

```python
nemo_automodel.components.speculative.regenerate._iter_samples(
    dataset: typing.Any,
    messages_column: str
) -> typing.Any
```

Yield rows' `messages_column` from an HF dataset or a list of dicts.

```python
nemo_automodel.components.speculative.regenerate._manifest_path(
    output_dir: pathlib.Path
) -> pathlib.Path
```

Return the manifest path inside `output_dir`.

```python
nemo_automodel.components.speculative.regenerate._process_shard(
    session,
    url: str,
    shard_samples: list[tuple[int, list[dict[str, typing.Any]], list[dict[str, typing.Any]]]],
    gen_cfg: nemo_automodel.components.speculative.regenerate.GenerationConfig,
    concurrency: int,
    timeout_s: float,
    max_retries: int
) -> list[dict[str, typing.Any]]
```

async

Run a single shard's prompts through the target server with bounded concurrency.

`shard_samples` items are `(global_index, original_messages, prompt_messages)`;
only the prompt is sent to the server, but both are kept around so the
written rows can preserve the original for traceability.

```python
nemo_automodel.components.speculative.regenerate._regenerate_one(
    session,
    url: str,
    prompt: list[dict[str, typing.Any]],
    gen_cfg: nemo_automodel.components.speculative.regenerate.GenerationConfig,
    timeout_s: float,
    max_retries: int
) -> list[dict[str, typing.Any]]
```

async

Call the target server once and return `prompt + [assistant]`.

```python
nemo_automodel.components.speculative.regenerate._run(
    args: argparse.Namespace
) -> int
```

async

Async driver: load dataset, regenerate, write shards. Returns a process exit code.

```python
nemo_automodel.components.speculative.regenerate._validate_args(
    args: argparse.Namespace
) -> None
```

Reject invalid CLI values before any network or disk work starts.

```python
nemo_automodel.components.speculative.regenerate._write_manifest(
    output_dir: pathlib.Path,
    manifest: dict[str, typing.Any]
) -> pathlib.Path
```

Persist the current regeneration config for future `--resume` checks.

```python
nemo_automodel.components.speculative.regenerate._write_shard(
    output_dir: pathlib.Path,
    shard_index: int,
    rows: list[dict[str, typing.Any]]
) -> pathlib.Path
```

Write a shard atomically (`.tmp` then `os.replace`) so partial writes never linger.

```python
nemo_automodel.components.speculative.regenerate.main(
    argv: list[str] | None = None
) -> int
```

CLI entry point. Parses `argv` and returns the process exit code.

```python
nemo_automodel.components.speculative.regenerate._AIOHTTP_INSTALL_HINT = "aiohttp is required for regenerate.py. It is normally pulled in via the project...
```

```python
nemo_automodel.components.speculative.regenerate._MANIFEST_NAME = 'manifest.json'
```

```python
nemo_automodel.components.speculative.regenerate._PYARROW_INSTALL_HINT = 'pyarrow is required to write regenerated shards as parquet. Install it with `uv...
```

```python
nemo_automodel.components.speculative.regenerate._SHARD_NAME_RE = re.compile('^shard-(\\d{6})\\.parquet$')
```

```python
nemo_automodel.components.speculative.regenerate.logger = logging.getLogger(__name__)
```