nemo_automodel.components.speculative.eagle.remote
nemo_automodel.components.speculative.eagle.remote
Train-inference disaggregation for EAGLE-3 target serving.
Runs the frozen target model as a standalone inference server on separate
GPU(s) while the draft model trains elsewhere. The training side talks to the
server through :class:RemoteEagle3TargetModel, which implements the
:class:~nemo_automodel.components.speculative.eagle.backend.Eagle3TargetBackend
contract, so the EAGLE-3 recipe consumes a remote target exactly like the
co-located HFEagle3TargetModel.
- HTTP is the control plane (input_ids up, tensor metadata down).
- NCCL is the data plane for the large supervision tensors (GPU-to-GPU, NVLink intra-node / RDMA inter-node), with a compact binary wire format fallback when NCCL is unavailable.
Submodules
nemo_automodel.components.speculative.eagle.remote.clientnemo_automodel.components.speculative.eagle.remote.protocolnemo_automodel.components.speculative.eagle.remote.servernemo_automodel.components.speculative.eagle.remote.transportnemo_automodel.components.speculative.eagle.remote.wire