`nemo_automodel.components.speculative.serve_sglang`#

Serve an Automodel-trained EAGLE / EAGLE-3 drafter with SGLang.

The EAGLE drafter checkpoints produced by the EAGLE recipes (recipes/llm/train_eagle{1,2,3}.py) are saved as draft_model.pt plus recipe metadata. This script converts that layout into an HF/SGLang-readable model/ directory when needed, then shells out to python -m sglang.launch_server with the right speculative-decoding flags.

NOTE — SGLang is NOT bundled with the NeMo-AutoModel container image and is intentionally NOT declared in pyproject.toml. To use this entry point, install it yourself into the same environment:

uv pip install "sglang>=0.5.9"

Refer to https://github.com/sgl-project/sglang for the version matching your CUDA / PyTorch stack. If SGLang is missing this script exits with a clear install hint rather than crashing on import.

Typical usage (after training produces a checkpoint at ./checkpoints/epoch_0_step_1000):

python -m nemo_automodel.components.speculative.serve_sglang \
    --target meta-llama/Llama-3.1-8B-Instruct \
    --draft ./checkpoints/epoch_0_step_1000 \
    --algorithm EAGLE3 \
    --num-steps 3 --topk 1 --num-draft-tokens 4

Pass --print-only to inspect the command without launching it; in that mode no checkpoint export is performed and the printed paths reflect what would be produced on a real launch.

Module Contents#

Functions#

`_has_hf_weight_file`	Return True if `path` already contains a HF-style weight artifact.
`_check_sglang_available`	Verify the `sglang` package can actually be imported, else exit (code 2).
`_load_safetensors_save_file`	Return `safetensors.torch.save_file` or exit with an install hint.
`_torch_load`	Load a torch pickle, preferring `weights_only=True` when supported.
`_rewrite_config_for_sglang`	Copy `src_config_path` to `dst_config_path` and normalize `architectures`.
`_config_needs_rewrite`	Return True when `config_path` does not match the SGLang architecture for `algorithm`.
`_infer_num_hidden_layers`	Infer num_hidden_layers from a state dict by counting unique layer indices.
`_regenerate_token_map`	Extract `selected_token_ids` from a recipe meta file into a SGLang token map.
`_maybe_export_training_checkpoint`	Convert recipe-native EAGLE checkpoints into an HF/SGLang-readable directory.
`resolve_draft_artifacts`	Resolve a user-supplied drafter path to the model and token-map paths SGLang expects.
`build_sglang_argv`	Build the `python -m sglang.launch_server` argv for a given config.
`_parse_args`	Parse command-line arguments for the serve helper.
`main`	Validate the environment, resolve the drafter ckpt, then exec sglang.

Data#

`logger`
`_SGLANG_INSTALL_HINT`
`_SAFETENSORS_INSTALL_HINT`
`_SGLANG_ARCHITECTURE_FOR_ALGORITHM`

API#

nemo_automodel.components.speculative.serve_sglang.logger#: ‘getLogger(…)’

nemo_automodel.components.speculative.serve_sglang._SGLANG_INSTALL_HINT#: ‘sglang is not installed in this environment. Install it manually with `uv pip install “sglang>=0.5.9…’

nemo_automodel.components.speculative.serve_sglang._SAFETENSORS_INSTALL_HINT#: ‘safetensors is required to export Automodel EAGLE checkpoints for SGLang. Install it with `uv pip in…’

nemo_automodel.components.speculative.serve_sglang._SGLANG_ARCHITECTURE_FOR_ALGORITHM#: None

nemo_automodel.components.speculative.serve_sglang._has_hf_weight_file(path: pathlib.Path) → bool#: Return True if path already contains a HF-style weight artifact.

nemo_automodel.components.speculative.serve_sglang._check_sglang_available() → None#: Verify the sglang package can actually be imported, else exit (code 2).

nemo_automodel.components.speculative.serve_sglang._load_safetensors_save_file() → Callable[..., None]#: Return safetensors.torch.save_file or exit with an install hint.

nemo_automodel.components.speculative.serve_sglang._torch_load(path: pathlib.Path) → Any#: Load a torch pickle, preferring weights_only=True when supported.

nemo_automodel.components.speculative.serve_sglang._rewrite_config_for_sglang( src_config_path: pathlib.Path, dst_config_path: pathlib.Path, algorithm: str, *, num_hidden_layers: int | None = None, ) → None#

Copy src_config_path to dst_config_path and normalize architectures.

For algorithms in _SGLANG_ARCHITECTURE_FOR_ALGORITHM the architectures field is rewritten to the SGLang-canonical class name (e.g. LlamaForCausalLMEagle3). For other algorithms the original field is preserved. When num_hidden_layers is provided it is written into the config so the exported drafter reflects its actual depth rather than the target model’s depth. The write is staged through a sibling .tmp file and finalized with os.replace so an interrupted write cannot leave the destination half-truncated when rewriting in place.

nemo_automodel.components.speculative.serve_sglang._config_needs_rewrite( config_path: pathlib.Path, algorithm: str, ) → bool#: Return True when config_path does not match the SGLang architecture for algorithm.

nemo_automodel.components.speculative.serve_sglang._infer_num_hidden_layers( state_dict: dict[str, Any], ) → int | None#: Infer num_hidden_layers from a state dict by counting unique layer indices.

nemo_automodel.components.speculative.serve_sglang._regenerate_token_map( meta_path: pathlib.Path, token_map_path: pathlib.Path, ) → None#: Extract selected_token_ids from a recipe meta file into a SGLang token map.

nemo_automodel.components.speculative.serve_sglang._maybe_export_training_checkpoint( checkpoint_dir: pathlib.Path, algorithm: str, *, dry_run: bool = False, ) → tuple[pathlib.Path, pathlib.Path | None]#

Convert recipe-native EAGLE checkpoints into an HF/SGLang-readable directory.

Parameters:

checkpoint_dir – Recipe checkpoint dir, expected to contain draft_model.pt and config.json (and eagle3_meta.pt for EAGLE-3).
algorithm – Speculative algorithm name, used to pick the right SGLang architecture and to decide whether a token map is needed.
dry_run – When True, return the paths that would be produced without writing anything.

Returns:

(export_dir, token_map_path_or_None).

nemo_automodel.components.speculative.serve_sglang.resolve_draft_artifacts( draft: str, algorithm: str, *, dry_run: bool = False, ) → tuple[str, str | None]#

Resolve a user-supplied drafter path to the model and token-map paths SGLang expects.

Accepts either the outer epoch_<E>_step_<S> directory or the inner model/ directory; HF Hub repo ids are passed through untouched.

Parameters:

draft – A local path or HF Hub repo id.
algorithm – Speculative algorithm name.
dry_run – When True, no on-disk export is performed and the returned paths reflect what would be produced on a real launch.

Returns:

(draft_path, token_map_path_or_None) suitable for SGLang flags.

nemo_automodel.components.speculative.serve_sglang.build_sglang_argv(args: argparse.Namespace) → list[str]#: Build the python -m sglang.launch_server argv for a given config.

nemo_automodel.components.speculative.serve_sglang._parse_args(argv: list[str] | None = None) → argparse.Namespace#: Parse command-line arguments for the serve helper.

nemo_automodel.components.speculative.serve_sglang.main(argv: list[str] | None = None) → int#

Validate the environment, resolve the drafter ckpt, then exec sglang.

Returns the SGLang server’s exit code, or 2 if SGLang or safetensors is missing.

nemo_automodel.components.speculative.serve_sglang#

Module Contents#

Functions#

Data#

API#

`nemo_automodel.components.speculative.serve_sglang`#