nemo_automodel.components.speculative.regenerate#
Regenerate dataset answers with the EAGLE target model.
EAGLE drafters learn best when the supervised assistant turn is produced by
the same model that will serve as the inference target. Many public chat
datasets were generated by other models, so the assistant tokens they contain
are off-distribution for the drafter. This script takes such a dataset,
strips the trailing assistant turn from each sample, replays the remaining
[system, user, ...] context against a target model running behind an
OpenAI-compatible SGLang server, and writes a new dataset whose
messages column ends with a freshly-generated assistant turn.
The output parquet files have the same messages column shape that
ChatDataset (used by build_eagle3_dataloader) consumes, so the
regenerated directory can be plugged directly into train_data_path in
the EAGLE-3 recipe.
Typical usage:
# 1. Spin up SGLang serving the target model in another shell:
python -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct --port 30000
# 2. Regenerate answers:
python -m nemo_automodel.components.speculative.regenerate \
--input-data Aeala/ShareGPT_Vicuna_unfiltered \
--output-dir ./regenerated/sharegpt_llama31_8b \
--target-server http://localhost:30000/v1 \
--model meta-llama/Llama-3.1-8B-Instruct \
--concurrency 64 --shard-size 1000
The script is resumable: re-running with the same --output-dir --resume
skips any shards that are already on disk, and verifies via a manifest that
the input/model/sharding configuration matches the earlier run.
Module Contents#
Classes#
Sampling parameters forwarded to the SGLang chat completion endpoint. |
Functions#
Return the regeneration settings that must stay stable across resume. |
|
Return the manifest path inside |
|
Persist the current regeneration config for future |
|
Guard |
|
Reject invalid CLI values before any network or disk work starts. |
|
Return |
|
Return the set of shard indices already present in |
|
Write a shard atomically ( |
|
POST |
|
Call the target server once and return |
|
Run a single shard’s prompts through the target server with bounded concurrency. |
|
Yield rows’ |
|
Async driver: load dataset, regenerate, write shards. Returns a process exit code. |
|
CLI entry point. Parses |
Data#
API#
- nemo_automodel.components.speculative.regenerate.logger#
‘getLogger(…)’
- nemo_automodel.components.speculative.regenerate._AIOHTTP_INSTALL_HINT#
“aiohttp is required for regenerate.py. It is normally pulled in via the project’s dependencies; if y…”
- nemo_automodel.components.speculative.regenerate._PYARROW_INSTALL_HINT#
‘pyarrow is required to write regenerated shards as parquet. Install it with
uv pip install pyarrow…’
- nemo_automodel.components.speculative.regenerate._SHARD_NAME_RE#
‘compile(…)’
- nemo_automodel.components.speculative.regenerate._MANIFEST_NAME#
‘manifest.json’
- class nemo_automodel.components.speculative.regenerate.GenerationConfig#
Sampling parameters forwarded to the SGLang chat completion endpoint.
- model: str#
None
- max_new_tokens: int#
None
- temperature: float#
None
- top_p: float#
None
- nemo_automodel.components.speculative.regenerate._build_manifest(args: argparse.Namespace) dict[str, Any]#
Return the regeneration settings that must stay stable across resume.
Fields that change the content of the output dataset are included. Fields that only affect throughput / reliability (
concurrency,timeout_s,max_retries) are intentionally omitted so a user can re-resume with different operational knobs.output_diris also omitted: the manifest lives insideoutput_dir, so encoding it here would only break resume after a directory rename.
- nemo_automodel.components.speculative.regenerate._manifest_path(output_dir: pathlib.Path) pathlib.Path#
Return the manifest path inside
output_dir.
- nemo_automodel.components.speculative.regenerate._write_manifest(
- output_dir: pathlib.Path,
- manifest: dict[str, Any],
Persist the current regeneration config for future
--resumechecks.
- nemo_automodel.components.speculative.regenerate._ensure_manifest_compatible(
- output_dir: pathlib.Path,
- manifest: dict[str, Any],
- *,
- resume: bool,
- existing_shards: set[int],
Guard
--resumeagainst silently mixing shards from different runs.Also refuses to start a fresh run that would silently clobber existing shards: if the output directory already contains shard files and the user did not pass
--resume, raise so they make an explicit choice (either delete the directory or pass--resume).
- nemo_automodel.components.speculative.regenerate._validate_args(args: argparse.Namespace) None#
Reject invalid CLI values before any network or disk work starts.
- nemo_automodel.components.speculative.regenerate._import_aiohttp()#
- nemo_automodel.components.speculative.regenerate._import_pyarrow_table()#
- nemo_automodel.components.speculative.regenerate._extract_prompt_messages(
- messages: list[dict[str, Any]],
Return
messagestruncated so its tail is not an assistant turn.EAGLE-3 supervision needs an assistant turn produced by the target model. The strategy here mirrors SpecForge’s offline regeneration: keep every leading system / user / tool turn (including any intermediate user<->assistant rounds), but drop the trailing assistant turn so the target can produce a fresh one.
Returns
Noneif the sample has no valid prompt context (e.g. it is empty, or starts with an assistant turn that gets dropped, leaving nothing). Callers should skip such samples.
- nemo_automodel.components.speculative.regenerate._existing_shard_indices(output_dir: pathlib.Path) set[int]#
Return the set of shard indices already present in
output_dir.
- nemo_automodel.components.speculative.regenerate._write_shard(
- output_dir: pathlib.Path,
- shard_index: int,
- rows: list[dict[str, Any]],
Write a shard atomically (
.tmpthenos.replace) so partial writes never linger.
- async nemo_automodel.components.speculative.regenerate._chat_completion(
- session,
- url: str,
- payload: dict[str, Any],
- *,
- timeout_s: float,
- max_retries: int,
POST
payloadtourland return the assistant text, with bounded retries.
- async nemo_automodel.components.speculative.regenerate._regenerate_one(
- session,
- url: str,
- prompt: list[dict[str, Any]],
- gen_cfg: nemo_automodel.components.speculative.regenerate.GenerationConfig,
- *,
- timeout_s: float,
- max_retries: int,
Call the target server once and return
prompt + [assistant].
- async nemo_automodel.components.speculative.regenerate._process_shard(
- session,
- url: str,
- shard_samples: list[tuple[int, list[dict[str, Any]], list[dict[str, Any]]]],
- gen_cfg: nemo_automodel.components.speculative.regenerate.GenerationConfig,
- *,
- concurrency: int,
- timeout_s: float,
- max_retries: int,
Run a single shard’s prompts through the target server with bounded concurrency.
shard_samplesitems are(global_index, original_messages, prompt_messages); only the prompt is sent to the server, but both are kept around so the written rows can preserve the original for traceability.
- nemo_automodel.components.speculative.regenerate._iter_samples(dataset: Any, messages_column: str) Any#
Yield rows’
messages_columnfrom an HF dataset or a list of dicts.
- async nemo_automodel.components.speculative.regenerate._run(args: argparse.Namespace) int#
Async driver: load dataset, regenerate, write shards. Returns a process exit code.
- nemo_automodel.components.speculative.regenerate._build_parser() argparse.ArgumentParser#
- nemo_automodel.components.speculative.regenerate.main(argv: list[str] | None = None) int#
CLI entry point. Parses
argvand returns the process exit code.