nemo_automodel.components.speculative.eagle.remote.transport
nemo_automodel.components.speculative.eagle.remote.transport
Dedicated NCCL transport for GPU-to-GPU supervision-tensor transfer.
A 2-process NCCL group connects the target server (rank 0) to the training client (rank 1). HTTP stays the control plane (input_ids up, tensor metadata down); this group is the data plane for the large supervision tensors, working over NVLink intra-node and RDMA/RoCE inter-node.
The group is created from an explicit TCPStore so it is independent of the
training job’s default process group. We delegate the actual group creation to
SGLang’s init_custom_process_group (the proven path; it builds a non
default group from a provided store). SGLang is an optional, non-bundled
dependency — when it is absent :meth:NCCLTransport.initialize returns False
and the caller falls back to the binary wire format.
Environment variables:
NEMO_EAGLE_ENABLE_NCCL—"1"(default) to attempt NCCL,"0"to force the wire-format fallback.NEMO_EAGLE_NCCL_PORT— TCP rendezvous port (default: HTTP port + 100).
Module Contents
Classes
Data
API
A dedicated 2-process NCCL group between server (rank 0) and client (rank 1).
Parameters
nccl_port: TCP port for the rendezvous store. host: Hostname/IP of the server (rendezvous master). is_server: True on the server side (rank 0), False on the client side (rank 1).
Abort and unregister the group.
The group is asymmetric: the client can finish before the long-lived
server, so a blocking destroy_process_group (which expects both
peers) would hang. Abort the local communicator and scrub it from
PyTorch’s global registry so the later default-group teardown does not
try to shut it down again.
Establish the NCCL group via TCP rendezvous; blocks until both peers connect.
Returns True on success, False on any failure (caller falls back to wire).
Receive tensors (client side) per metadata in keys_order.
Send tensors (server side) in keys_order; skips None entries.