nemo_automodel.components.speculative.eagle.sglang_target

View as Markdown

SGLang adapter for the EAGLE-3 target backend.

A thin engine adapter on top of the engine-agnostic contract in :mod:nemo_automodel.components.speculative.eagle.target_runner: it builds a SGLang runner and wraps it in :class:RunnerEagle3TargetModel. SGLang is the fastest serving path for mainstream architectures, so a remote target server (:mod:nemo_automodel.components.speculative.serve_target) can hold the target on dedicated GPUs while the draft trains elsewhere.

All supervision-contract logic (shift / aux-concatenation semantics, aux-layer defaulting, embedding access) lives in target_runner and is shared with any future engine adapter (e.g. vLLM). This module owns only SGLang construction, and the SGLang-internal forward is isolated further in :mod:nemo_automodel.components.speculative.eagle.sglang_runner, imported lazily by :meth:SGLangEagle3TargetModel.from_pretrained so importing this module never pulls in SGLang.

Module Contents

Classes

NameDescription
SGLangEagle3TargetModelEAGLE-3 target backend whose runner is SGLang.

API

class nemo_automodel.components.speculative.eagle.sglang_target.SGLangEagle3TargetModel()

Bases: RunnerEagle3TargetModel

EAGLE-3 target backend whose runner is SGLang.

Adds only SGLang construction; the supervision contract is inherited from :class:RunnerEagle3TargetModel.

nemo_automodel.components.speculative.eagle.sglang_target.SGLangEagle3TargetModel.from_pretrained(
model_path: str,
aux_layer_ids: typing.Optional[typing.Sequence[int]] = None,
dtype: typing.Optional[torch.dtype] = None,
tp_size: int = 1,
trust_remote_code: bool = False,
sglang_kwargs = {}
) -> 'SGLangEagle3TargetModel'
classmethod

Build a SGLang runner for model_path and wrap it as a target backend.

SGLang is imported here (not at module load) so this module stays importable in environments without SGLang; sglang_kwargs are passed through to SGLang’s ServerArgs for endpoint / parallelism tuning.