nemo_automodel.components.speculative.eagle.remote

View as Markdown

Train-inference disaggregation for EAGLE-3 target serving.

Runs the frozen target model as a standalone inference server on separate GPU(s) while the draft model trains elsewhere. The training side talks to the server through :class:RemoteEagle3TargetModel, which implements the :class:~nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend contract, so the EAGLE-3 recipe consumes a remote target exactly like the co-located HFEagle3TargetModel.

  • HTTP is the control plane (input_ids up, tensor metadata down).
  • NCCL is the data plane for the large supervision tensors (GPU-to-GPU, NVLink intra-node / RDMA inter-node), with a compact binary wire format fallback when NCCL is unavailable.

Submodules

Package Contents

Data

__all__

API

nemo_automodel.components.speculative.eagle.remote.__all__ = ['RemoteEagle3TargetModel']