nemo_automodel.components.speculative.regenerate#

Regenerate dataset answers with the EAGLE target model.

EAGLE drafters learn best when the supervised assistant turn is produced by the same model that will serve as the inference target. Many public chat datasets were generated by other models, so the assistant tokens they contain are off-distribution for the drafter. This script takes such a dataset, strips the trailing assistant turn from each sample, replays the remaining [system, user, ...] context against a target model running behind an OpenAI-compatible SGLang server, and writes a new dataset whose messages column ends with a freshly-generated assistant turn.

The output parquet files have the same messages column shape that ChatDataset (used by build_eagle3_dataloader) consumes, so the regenerated directory can be plugged directly into train_data_path in the EAGLE-3 recipe.

Typical usage:

# 1. Spin up SGLang serving the target model in another shell:
python -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct --port 30000

# 2. Regenerate answers:
python -m nemo_automodel.components.speculative.regenerate \
    --input-data Aeala/ShareGPT_Vicuna_unfiltered \
    --output-dir ./regenerated/sharegpt_llama31_8b \
    --target-server http://localhost:30000/v1 \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --concurrency 64 --shard-size 1000

The script is resumable: re-running with the same --output-dir --resume skips any shards that are already on disk, and verifies via a manifest that the input/model/sharding configuration matches the earlier run.

Module Contents#

Classes#

GenerationConfig

Sampling parameters forwarded to the SGLang chat completion endpoint.

Functions#

_build_manifest

Return the regeneration settings that must stay stable across resume.

_manifest_path

Return the manifest path inside output_dir.

_write_manifest

Persist the current regeneration config for future --resume checks.

_ensure_manifest_compatible

Guard --resume against silently mixing shards from different runs.

_validate_args

Reject invalid CLI values before any network or disk work starts.

_import_aiohttp

_import_pyarrow_table

_extract_prompt_messages

Return messages truncated so its tail is not an assistant turn.

_existing_shard_indices

Return the set of shard indices already present in output_dir.

_write_shard

Write a shard atomically (.tmp then os.replace) so partial writes never linger.

_chat_completion

POST payload to url and return the assistant text, with bounded retries.

_regenerate_one

Call the target server once and return prompt + [assistant].

_process_shard

Run a single shard’s prompts through the target server with bounded concurrency.

_iter_samples

Yield rows’ messages_column from an HF dataset or a list of dicts.

_run

Async driver: load dataset, regenerate, write shards. Returns a process exit code.

_build_parser

main

CLI entry point. Parses argv and returns the process exit code.

Data#

API#

nemo_automodel.components.speculative.regenerate.logger#

‘getLogger(…)’

nemo_automodel.components.speculative.regenerate._AIOHTTP_INSTALL_HINT#

“aiohttp is required for regenerate.py. It is normally pulled in via the project’s dependencies; if y…”

nemo_automodel.components.speculative.regenerate._PYARROW_INSTALL_HINT#

‘pyarrow is required to write regenerated shards as parquet. Install it with uv pip install pyarrow…’

nemo_automodel.components.speculative.regenerate._SHARD_NAME_RE#

‘compile(…)’

nemo_automodel.components.speculative.regenerate._MANIFEST_NAME#

‘manifest.json’

class nemo_automodel.components.speculative.regenerate.GenerationConfig#

Sampling parameters forwarded to the SGLang chat completion endpoint.

model: str#

None

max_new_tokens: int#

None

temperature: float#

None

top_p: float#

None

nemo_automodel.components.speculative.regenerate._build_manifest(args: argparse.Namespace) dict[str, Any]#

Return the regeneration settings that must stay stable across resume.

Fields that change the content of the output dataset are included. Fields that only affect throughput / reliability (concurrency, timeout_s, max_retries) are intentionally omitted so a user can re-resume with different operational knobs. output_dir is also omitted: the manifest lives inside output_dir, so encoding it here would only break resume after a directory rename.

nemo_automodel.components.speculative.regenerate._manifest_path(output_dir: pathlib.Path) pathlib.Path#

Return the manifest path inside output_dir.

nemo_automodel.components.speculative.regenerate._write_manifest(
output_dir: pathlib.Path,
manifest: dict[str, Any],
) pathlib.Path#

Persist the current regeneration config for future --resume checks.

nemo_automodel.components.speculative.regenerate._ensure_manifest_compatible(
output_dir: pathlib.Path,
manifest: dict[str, Any],
*,
resume: bool,
existing_shards: set[int],
) None#

Guard --resume against silently mixing shards from different runs.

Also refuses to start a fresh run that would silently clobber existing shards: if the output directory already contains shard files and the user did not pass --resume, raise so they make an explicit choice (either delete the directory or pass --resume).

nemo_automodel.components.speculative.regenerate._validate_args(args: argparse.Namespace) None#

Reject invalid CLI values before any network or disk work starts.

nemo_automodel.components.speculative.regenerate._import_aiohttp()#
nemo_automodel.components.speculative.regenerate._import_pyarrow_table()#
nemo_automodel.components.speculative.regenerate._extract_prompt_messages(
messages: list[dict[str, Any]],
) list[dict[str, Any]] | None#

Return messages truncated so its tail is not an assistant turn.

EAGLE-3 supervision needs an assistant turn produced by the target model. The strategy here mirrors SpecForge’s offline regeneration: keep every leading system / user / tool turn (including any intermediate user<->assistant rounds), but drop the trailing assistant turn so the target can produce a fresh one.

Returns None if the sample has no valid prompt context (e.g. it is empty, or starts with an assistant turn that gets dropped, leaving nothing). Callers should skip such samples.

nemo_automodel.components.speculative.regenerate._existing_shard_indices(output_dir: pathlib.Path) set[int]#

Return the set of shard indices already present in output_dir.

nemo_automodel.components.speculative.regenerate._write_shard(
output_dir: pathlib.Path,
shard_index: int,
rows: list[dict[str, Any]],
) pathlib.Path#

Write a shard atomically (.tmp then os.replace) so partial writes never linger.

async nemo_automodel.components.speculative.regenerate._chat_completion(
session,
url: str,
payload: dict[str, Any],
*,
timeout_s: float,
max_retries: int,
) str#

POST payload to url and return the assistant text, with bounded retries.

async nemo_automodel.components.speculative.regenerate._regenerate_one(
session,
url: str,
prompt: list[dict[str, Any]],
gen_cfg: nemo_automodel.components.speculative.regenerate.GenerationConfig,
*,
timeout_s: float,
max_retries: int,
) list[dict[str, Any]]#

Call the target server once and return prompt + [assistant].

async nemo_automodel.components.speculative.regenerate._process_shard(
session,
url: str,
shard_samples: list[tuple[int, list[dict[str, Any]], list[dict[str, Any]]]],
gen_cfg: nemo_automodel.components.speculative.regenerate.GenerationConfig,
*,
concurrency: int,
timeout_s: float,
max_retries: int,
) list[dict[str, Any]]#

Run a single shard’s prompts through the target server with bounded concurrency.

shard_samples items are (global_index, original_messages, prompt_messages); only the prompt is sent to the server, but both are kept around so the written rows can preserve the original for traceability.

nemo_automodel.components.speculative.regenerate._iter_samples(dataset: Any, messages_column: str) Any#

Yield rows’ messages_column from an HF dataset or a list of dicts.

async nemo_automodel.components.speculative.regenerate._run(args: argparse.Namespace) int#

Async driver: load dataset, regenerate, write shards. Returns a process exit code.

nemo_automodel.components.speculative.regenerate._build_parser() argparse.ArgumentParser#
nemo_automodel.components.speculative.regenerate.main(argv: list[str] | None = None) int#

CLI entry point. Parses argv and returns the process exit code.