nemo_automodel.components.speculative.serve_sglang#
Serve an Automodel-trained EAGLE / EAGLE-3 drafter with SGLang.
The EAGLE drafter checkpoints produced by the EAGLE recipes
(recipes/llm/train_eagle{1,2,3}.py) are saved as draft_model.pt plus
recipe metadata. This script converts that layout into an HF/SGLang-readable
model/ directory when needed, then shells out to
python -m sglang.launch_server with the right speculative-decoding flags.
NOTE — SGLang is NOT bundled with the NeMo-AutoModel container image and
is intentionally NOT declared in pyproject.toml. To use this entry
point, install it yourself into the same environment:
uv pip install "sglang>=0.5.9"
Refer to https://github.com/sgl-project/sglang for the version matching your CUDA / PyTorch stack. If SGLang is missing this script exits with a clear install hint rather than crashing on import.
Typical usage (after training produces a checkpoint at
./checkpoints/epoch_0_step_1000):
python -m nemo_automodel.components.speculative.serve_sglang \
--target meta-llama/Llama-3.1-8B-Instruct \
--draft ./checkpoints/epoch_0_step_1000 \
--algorithm EAGLE3 \
--num-steps 3 --topk 1 --num-draft-tokens 4
Pass --print-only to inspect the command without launching it; in that
mode no checkpoint export is performed and the printed paths reflect what
would be produced on a real launch.
Module Contents#
Functions#
Return True if |
|
Verify the |
|
Return |
|
Load a torch pickle, preferring |
|
Copy |
|
Return True when |
|
Infer num_hidden_layers from a state dict by counting unique layer indices. |
|
Extract |
|
Convert recipe-native EAGLE checkpoints into an HF/SGLang-readable directory. |
|
Resolve a user-supplied drafter path to the model and token-map paths SGLang expects. |
|
Build the |
|
Parse command-line arguments for the serve helper. |
|
Validate the environment, resolve the drafter ckpt, then exec sglang. |
Data#
API#
- nemo_automodel.components.speculative.serve_sglang.logger#
‘getLogger(…)’
- nemo_automodel.components.speculative.serve_sglang._SGLANG_INSTALL_HINT#
‘sglang is not installed in this environment. Install it manually with `uv pip install “sglang>=0.5.9…’
- nemo_automodel.components.speculative.serve_sglang._SAFETENSORS_INSTALL_HINT#
‘safetensors is required to export Automodel EAGLE checkpoints for SGLang. Install it with `uv pip in…’
- nemo_automodel.components.speculative.serve_sglang._SGLANG_ARCHITECTURE_FOR_ALGORITHM#
None
- nemo_automodel.components.speculative.serve_sglang._has_hf_weight_file(path: pathlib.Path) bool#
Return True if
pathalready contains a HF-style weight artifact.
- nemo_automodel.components.speculative.serve_sglang._check_sglang_available() None#
Verify the
sglangpackage can actually be imported, else exit (code 2).
- nemo_automodel.components.speculative.serve_sglang._load_safetensors_save_file() Callable[..., None]#
Return
safetensors.torch.save_fileor exit with an install hint.
- nemo_automodel.components.speculative.serve_sglang._torch_load(path: pathlib.Path) Any#
Load a torch pickle, preferring
weights_only=Truewhen supported.
- nemo_automodel.components.speculative.serve_sglang._rewrite_config_for_sglang(
- src_config_path: pathlib.Path,
- dst_config_path: pathlib.Path,
- algorithm: str,
- *,
- num_hidden_layers: int | None = None,
Copy
src_config_pathtodst_config_pathand normalizearchitectures.For algorithms in
_SGLANG_ARCHITECTURE_FOR_ALGORITHMthearchitecturesfield is rewritten to the SGLang-canonical class name (e.g.LlamaForCausalLMEagle3). For other algorithms the original field is preserved. Whennum_hidden_layersis provided it is written into the config so the exported drafter reflects its actual depth rather than the target model’s depth. The write is staged through a sibling.tmpfile and finalized withos.replaceso an interrupted write cannot leave the destination half-truncated when rewriting in place.
- nemo_automodel.components.speculative.serve_sglang._config_needs_rewrite(
- config_path: pathlib.Path,
- algorithm: str,
Return True when
config_pathdoes not match the SGLang architecture foralgorithm.
- state_dict: dict[str, Any],
Infer num_hidden_layers from a state dict by counting unique layer indices.
- nemo_automodel.components.speculative.serve_sglang._regenerate_token_map(
- meta_path: pathlib.Path,
- token_map_path: pathlib.Path,
Extract
selected_token_idsfrom a recipe meta file into a SGLang token map.
- nemo_automodel.components.speculative.serve_sglang._maybe_export_training_checkpoint(
- checkpoint_dir: pathlib.Path,
- algorithm: str,
- *,
- dry_run: bool = False,
Convert recipe-native EAGLE checkpoints into an HF/SGLang-readable directory.
- Parameters:
checkpoint_dir – Recipe checkpoint dir, expected to contain
draft_model.ptandconfig.json(andeagle3_meta.ptfor EAGLE-3).algorithm – Speculative algorithm name, used to pick the right SGLang architecture and to decide whether a token map is needed.
dry_run – When True, return the paths that would be produced without writing anything.
- Returns:
(export_dir, token_map_path_or_None).
- nemo_automodel.components.speculative.serve_sglang.resolve_draft_artifacts(
- draft: str,
- algorithm: str,
- *,
- dry_run: bool = False,
Resolve a user-supplied drafter path to the model and token-map paths SGLang expects.
Accepts either the outer
epoch_<E>_step_<S>directory or the innermodel/directory; HF Hub repo ids are passed through untouched.- Parameters:
draft – A local path or HF Hub repo id.
algorithm – Speculative algorithm name.
dry_run – When True, no on-disk export is performed and the returned paths reflect what would be produced on a real launch.
- Returns:
(draft_path, token_map_path_or_None)suitable for SGLang flags.
- nemo_automodel.components.speculative.serve_sglang.build_sglang_argv(args: argparse.Namespace) list[str]#
Build the
python -m sglang.launch_serverargv for a given config.
- nemo_automodel.components.speculative.serve_sglang._parse_args(argv: list[str] | None = None) argparse.Namespace#
Parse command-line arguments for the serve helper.
- nemo_automodel.components.speculative.serve_sglang.main(argv: list[str] | None = None) int#
Validate the environment, resolve the drafter ckpt, then exec sglang.
Returns the SGLang server’s exit code, or
2if SGLang or safetensors is missing.