nemo_automodel.components.speculative.eagle.remote.server
nemo_automodel.components.speculative.eagle.remote.server
Remote EAGLE-3 target server.
Runs the frozen target model and, for each training request, produces the
draft-vocab supervision (aux hidden states, target_probs, position_mask)
and ships it back to the training client. The supervision computation reuses the
co-located building blocks verbatim — HFEagle3TargetModel.generate_batch
for the forward + aux capture and _compute_target_distribution for the
draft-vocab projection — so a remote run is numerically identical to a
co-located one.
The HTTP request handling is split from the http.server plumbing
(:class:TargetModelServer holds the pure logic) so it can be unit-tested on
CPU with the NCCL data plane disabled (wire-format path).
Module Contents
Classes
Functions
Data
API
Request-handling logic for the remote target server (HTTP-transport agnostic).
Parameters
target_wrapper:
A loaded HFEagle3TargetModel (or any object exposing the same
generate_batch / get_input_embeddings surface).
nccl_port:
TCP rendezvous port for the NCCL data plane.
host:
Bind/advertise address (rendezvous master for NCCL).
Send the pending supervision tensors over NCCL (after the HTTP flush).
Run the target and serialize the supervision.
Returns (body, used_nccl). When NCCL is used the body is the JSON
metadata only and the tensors are queued for :meth:flush_nccl_send
(sent after the HTTP response is flushed, to avoid a recv deadlock).
Return the target input-embedding weight (used once to seed the draft).
Produce the precomputed draft-vocab supervision for one batch.
Mirrors the co-located path exactly: generate_batch runs the target and
returns shifted logits / input_ids / loss_mask plus the aux hidden states;
_compute_target_distribution then projects the shifted logits onto the
draft vocab. Returns tensors keyed by :data:protocol.SUPERVISION_KEYS.
Run the blocking HTTP server until the client disconnects or Ctrl-C.