nemo_automodel.components.speculative.eagle.sglang_target
nemo_automodel.components.speculative.eagle.sglang_target
SGLang adapter for the EAGLE-3 target backend.
A thin engine adapter on top of the engine-agnostic contract in
:mod:nemo_automodel.components.speculative.eagle.target_runner: it builds a
SGLang runner and wraps it in :class:RunnerEagle3TargetModel. SGLang is the
fastest serving path for mainstream architectures, so a remote target server
(:mod:nemo_automodel.components.speculative.serve_target) can hold the target
on dedicated GPUs while the draft trains elsewhere.
All supervision-contract logic (shift / aux-concatenation semantics, aux-layer
defaulting, embedding access) lives in target_runner and is shared with any
future engine adapter (e.g. vLLM). This module owns only SGLang construction,
and the SGLang-internal forward is isolated further in
:mod:nemo_automodel.components.speculative.eagle.sglang_runner, imported lazily
by :meth:SGLangEagle3TargetModel.from_pretrained so importing this module never
pulls in SGLang.
Module Contents
Classes
API
Bases: RunnerEagle3TargetModel
EAGLE-3 target backend whose runner is SGLang.
Adds only SGLang construction; the supervision contract is inherited from
:class:RunnerEagle3TargetModel.
Build a SGLang runner for model_path and wrap it as a target backend.
SGLang is imported here (not at module load) so this module stays
importable in environments without SGLang; sglang_kwargs are passed
through to SGLang’s ServerArgs for endpoint / parallelism tuning.