nemo_automodel.components.speculative.eagle.backend

View as Markdown

Backend abstraction for the EAGLE-3 target model.

The frozen target model is a supervision provider: for every training batch it produces the auxiliary hidden states and the per-token target distribution the draft model is trained against. EAGLE-3 training never updates the target, so it does not have to share the training GPU. This interface lets the recipe consume the target uniformly whether it runs co-located in-process (HFEagle3TargetModel) or, in a later change, as a remote inference service on separate GPUs.

Module Contents

Classes

NameDescription
Eagle3TargetBackendAbstract contract every EAGLE-3 target-model backend implements.

API

class nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend()
Abstract

Abstract contract every EAGLE-3 target-model backend implements.

Two supervision encodings are allowed, both consumed directly by :meth:Eagle3TrainerModule.forward:

  • full logits — :attr:Eagle3TargetBatch.logits carries the target’s full-vocab logits and the draft-vocab projection happens trainer-side. Cheap when co-located (the tensor never leaves the GPU); impractical to ship over a wire because it is full-vocab sized.
  • precomputed — :attr:Eagle3TargetBatch.target_probs and :attr:Eagle3TargetBatch.position_mask carry the already-projected draft-vocab distribution, so a backend that computes them itself (e.g. a remote server) only has to transfer draft-vocab-sized tensors.

A backend returns exactly one of the two encodings from :meth:generate_batch; the recipe forwards whichever is present.

supports_async
bool

Whether :meth:generate_batch_async is implemented (prefetch-capable).

nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.close() -> None

Release backend resources (remote connections, server handles).

nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.generate_batch(
input_ids: torch.Tensor,
attention_mask: torch.Tensor,
loss_mask: torch.Tensor
) -> 'Eagle3TargetBatch'
abstract

Run the target and return the supervision for one training batch.

nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.generate_batch_async(
input_ids: torch.Tensor,
attention_mask: torch.Tensor,
loss_mask: torch.Tensor
)

Submit an asynchronous :meth:generate_batch for prefetch pipelining.

Only backends that overlap target inference with draft training implement this; the default signals a synchronous backend so callers fall back to :meth:generate_batch.

nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.get_input_embeddings() -> torch.nn.Module
abstract

Return the target input-embedding module (used to seed the draft).

nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend.set_vocab_mapping(
selected_token_ids: torch.Tensor,
selected_token_mask: torch.Tensor
) -> None

Provide the draft-vocab mapping needed to precompute supervision.

Co-located backends keep the mapping on the trainer module and derive the distribution there, so the default is a no-op. A backend that computes target_probs itself overrides this to receive the mapping.