> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.eagle.target_runner

Engine-agnostic EAGLE-3 target backend built on a pluggable runner.

The supervision contract (how a target's raw logits / aux hidden states become
an :class:`Eagle3TargetBatch`) is identical no matter which inference engine
runs the frozen target. This module owns that contract:

* :class:`TargetRunner` is the narrow surface a concrete engine must implement
  (`forward_eagle3` / `set_aux_layers` / `input_embedding_weight` and a
  `model` exposing `.config` + `.parameters()`). It is the single seam
  between the engine-coupled code and the rest of the stack: SGLang implements
  it today (`sglang_runner.SGLangTargetRunner`), and a vLLM runner can
  implement the same protocol and drop in without touching this file, the
  trainer, or the remote server.
* :class:`RunnerEagle3TargetModel` is the :class:`Eagle3TargetBackend` that
  assembles the supervision batch from any :class:`TargetRunner`. Its shift /
  aux-concatenation semantics are identical to :class:`HFEagle3TargetModel`, so
  a run through any runner is numerically equivalent to the co-located one.

Keeping the contract here (and importing engines lazily in their own adapter
modules) means importing this module never pulls in SGLang or vLLM, so the
contract stays unit-testable on CPU against a fake runner.

## Module Contents

### Classes

| Name                                                                                                            | Description                                                          |
| --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| [`RunnerEagle3TargetModel`](#nemo_automodel-components-speculative-eagle-target_runner-RunnerEagle3TargetModel) | EAGLE-3 target backend that runs the frozen target through a runner. |
| [`TargetRunner`](#nemo_automodel-components-speculative-eagle-target_runner-TargetRunner)                       | Minimal engine surface :class:`RunnerEagle3TargetModel` depends on.  |

### API

```python
class nemo_automodel.components.speculative.eagle.target_runner.RunnerEagle3TargetModel(
    runner: nemo_automodel.components.speculative.eagle.target_runner.TargetRunner,
    aux_layer_ids: typing.Optional[typing.Sequence[int]] = None
)
```

**Bases:** [Eagle3TargetBackend](/nemo-automodel/nemo_automodel/components/speculative/eagle/backend#nemo_automodel-components-speculative-eagle-backend-Eagle3TargetBackend)

EAGLE-3 target backend that runs the frozen target through a runner.

Engine-agnostic: it owns only the supervision *contract* and delegates the
actual forward to a :class:`TargetRunner`. Concrete engines subclass this to
add their own `from_pretrained` (which builds the runner); everything else
is inherited.

## Parameters

runner:
A loaded runner implementing :class:`TargetRunner`.
aux\_layer\_ids:
The three decoder layers to capture (low / mid / high). When `None`
the shared EAGLE-3 default recipe is used, matching every other backend.

```python
nemo_automodel.components.speculative.eagle.target_runner.RunnerEagle3TargetModel.close() -> None
```

Release the runner (frees GPU memory / engine handles).

```python
nemo_automodel.components.speculative.eagle.target_runner.RunnerEagle3TargetModel.generate_batch(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor,
    loss_mask: torch.Tensor
) -> nemo_automodel.components.speculative.eagle.target.Eagle3TargetBatch
```

Run the runner's target and capture aux hidden states plus logits.

Produces an :class:`Eagle3TargetBatch` byte-for-byte compatible with
:meth:`HFEagle3TargetModel.generate_batch`: the logits, `input_ids`
and `loss_mask` are shifted left by one (next-token alignment) while
the aux hidden states are kept position-aligned. The draft-vocab
projection happens trainer-side / server-side from `logits`, exactly
as in the co-located path.

Note: sequences are assumed right-padded (loss\_mask zeros the pad). With
causal attention, trailing pad tokens do not affect earlier positions,
so the captured supervision matches a masked HuggingFace forward.

```python
nemo_automodel.components.speculative.eagle.target_runner.RunnerEagle3TargetModel.get_input_embeddings() -> types.SimpleNamespace
```

Return an object exposing `.weight` (the target input embeddings).

Matches the offline-cache / remote path: the draft's
`copy_embeddings_from_target` only reads `.weight`.

```python
class nemo_automodel.components.speculative.eagle.target_runner.TargetRunner()
```

Protocol

Minimal engine surface :class:`RunnerEagle3TargetModel` depends on.

Implemented for real by an inference-engine runner (`SGLangTargetRunner`
today, a vLLM runner later) and faked in unit tests, which is why the
backend depends on this protocol rather than on any engine directly.

```python
nemo_automodel.components.speculative.eagle.target_runner.TargetRunner.forward_eagle3(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor]
```

Run the target once and return `(logits, aux_hidden_states)`.

`logits` is `[batch, seq, vocab]` (full vocab, unshifted) and
`aux_hidden_states` is `[batch, seq, 3 * hidden]` (the three capture
layers concatenated on the last dim, unshifted).

```python
nemo_automodel.components.speculative.eagle.target_runner.TargetRunner.input_embedding_weight() -> torch.Tensor
```

Return the target input-embedding weight `[vocab, hidden]`.

```python
nemo_automodel.components.speculative.eagle.target_runner.TargetRunner.set_aux_layers(
    aux_layer_ids: typing.Sequence[int]
) -> None
```

Tell the underlying model which 3 decoder layers to capture.