nemo_automodel.components.speculative.eagle.remote.client
nemo_automodel.components.speculative.eagle.remote.client
Training-side client for the remote EAGLE-3 target server.
:class:RemoteEagle3TargetModel implements the Eagle3TargetBackend contract
by delegating generate_batch to one or more remote target servers. It POSTs
input_ids over HTTP and receives the supervision tensors either over NCCL
(GPU-direct, body carries only metadata) or as a binary wire blob (fallback).
Multiple server URLs are dispatched round-robin so the prefetch pipeline in the training loop can keep several requests in flight (one per server) and overlap target inference with draft training.
Module Contents
Classes
Data
API
Bases: Eagle3TargetBackend
EAGLE-3 target backend that delegates forward passes to remote servers.
Fetch the target input-embedding weight once and cache it.
Returns an object exposing .weight (the only attribute the draft’s
copy_embeddings_from_target reads), matching the offline-cache path.
Future-like wrapper that converts a worker-thread result into a batch.
HTTP + NCCL connection to a single remote target server.
POST /generate and return the supervision tensors (NCCL or wire).
/generate is the per-step hot path, so a transient timeout / connection
reset here would otherwise abort a long remote-training run. The wire path
is an idempotent HTTP round-trip, so it reuses :meth:request’s
exponential-backoff retry. The NCCL path is deliberately a single attempt:
the POST triggers a server-side NCCL send paired with the recv_tensors
below, so a blind retry would issue a second send and desync the 2-process
data-plane group (the client’s one recv vs the server’s two sends would
hang). Recovering the NCCL path needs a transport resync (tear down +
re-init, or fall back to wire) and is tracked separately.
Serialized on _generate_lock so this process never has two
/generate requests in flight against the same server: the NCCL recv
posted here must pair with this request’s send, and the server’s
hook-based aux capture is not reentrant.
POST payload to endpoint with exponential-backoff retry.