> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.eagle.backend

Backend abstraction for the EAGLE-3 target model.

The frozen target model is a *supervision provider*: for every training batch
it produces the auxiliary hidden states and the per-token target distribution
the draft model is trained against. EAGLE-3 training never updates the target,
so it does not have to share the training GPU. This interface lets the recipe
consume the target uniformly whether it runs co-located in-process
(`HFEagle3TargetModel`) or, in a later change, as a remote inference service
on separate GPUs.

## Module Contents

### Classes

| Name                                                                                              | Description                                                      |
| ------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| [`Eagle3TargetBackend`](#nemo_automodel-components-speculative-eagle-backend-Eagle3TargetBackend) | Abstract contract every EAGLE-3 target-model backend implements. |

### API

```python
class nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend()
```

Abstract

Abstract contract every EAGLE-3 target-model backend implements.

Two supervision encodings are allowed, both consumed directly by
:meth:`Eagle3TrainerModule.forward`:

* **full logits** -- :attr:`Eagle3TargetBatch.logits` carries the target's
  full-vocab logits and the draft-vocab projection happens trainer-side.
  Cheap when co-located (the tensor never leaves the GPU); impractical to
  ship over a wire because it is full-vocab sized.
* **precomputed** -- :attr:`Eagle3TargetBatch.target_probs` and
  :attr:`Eagle3TargetBatch.position_mask` carry the already-projected
  draft-vocab distribution, so a backend that computes them itself (e.g. a
  remote server) only has to transfer draft-vocab-sized tensors.

A backend returns exactly one of the two encodings from
:meth:`generate_batch`; the recipe forwards whichever is present.

Whether :meth:`generate_batch_async` is implemented (prefetch-capable).

```python
nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.close() -> None
```

Release backend resources (remote connections, server handles).

```python
nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.generate_batch(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor,
    loss_mask: torch.Tensor
) -> 'Eagle3TargetBatch'
```

abstract

Run the target and return the supervision for one training batch.

```python
nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.generate_batch_async(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor,
    loss_mask: torch.Tensor
)
```

Submit an asynchronous :meth:`generate_batch` for prefetch pipelining.

Only backends that overlap target inference with draft training
implement this; the default signals a synchronous backend so callers
fall back to :meth:`generate_batch`.

```python
nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.get_input_embeddings() -> torch.nn.Module
```

abstract

Return the target input-embedding module (used to seed the draft).

```python
nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.set_vocab_mapping(
    selected_token_ids: torch.Tensor,
    selected_token_mask: torch.Tensor
) -> None
```

Provide the draft-vocab mapping needed to precompute supervision.

Co-located backends keep the mapping on the trainer module and derive
the distribution there, so the default is a no-op. A backend that
computes `target_probs` itself overrides this to receive the mapping.