> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.eagle.remote.transport

Dedicated NCCL transport for GPU-to-GPU supervision-tensor transfer.

A 2-process NCCL group connects the target server (rank 0) to the training
client (rank 1). HTTP stays the control plane (input\_ids up, tensor metadata
down); this group is the data plane for the large supervision tensors, working
over NVLink intra-node and RDMA/RoCE inter-node.

The group is created from an explicit `TCPStore` so it is independent of the
training job's default process group. We delegate the actual group creation to
SGLang's `init_custom_process_group` (the proven path; it builds a *non*
default group from a provided store). SGLang is an optional, non-bundled
dependency -- when it is absent :meth:`NCCLTransport.initialize` returns False
and the caller falls back to the binary wire format.

Environment variables:

* `NEMO_EAGLE_ENABLE_NCCL` -- `"1"` (default) to attempt NCCL, `"0"` to
  force the wire-format fallback.
* `NEMO_EAGLE_NCCL_PORT` -- TCP rendezvous port (default: HTTP port + 100).

## Module Contents

### Classes

| Name                                                                                           | Description                                                                   |
| ---------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`NCCLTransport`](#nemo_automodel-components-speculative-eagle-remote-transport-NCCLTransport) | A dedicated 2-process NCCL group between server (rank 0) and client (rank 1). |

### Data

[`_ELEMENT_SIZE`](#nemo_automodel-components-speculative-eagle-remote-transport-_ELEMENT_SIZE)

[`_NCCL_UNSUPPORTED_DTYPES`](#nemo_automodel-components-speculative-eagle-remote-transport-_NCCL_UNSUPPORTED_DTYPES)

[`logger`](#nemo_automodel-components-speculative-eagle-remote-transport-logger)

### API

```python
class nemo_automodel.components.speculative.eagle.remote.transport.NCCLTransport(
    nccl_port: int,
    host: str,
    is_server: bool
)
```

A dedicated 2-process NCCL group between server (rank 0) and client (rank 1).

## Parameters

nccl\_port:
TCP port for the rendezvous store.
host:
Hostname/IP of the server (rendezvous master).
is\_server:
True on the server side (rank 0), False on the client side (rank 1).

```python
nemo_automodel.components.speculative.eagle.remote.transport.NCCLTransport.destroy() -> None
```

Abort and unregister the group.

The group is asymmetric: the client can finish before the long-lived
server, so a blocking `destroy_process_group` (which expects both
peers) would hang. Abort the local communicator and scrub it from
PyTorch's global registry so the later default-group teardown does not
try to shut it down again.

```python
nemo_automodel.components.speculative.eagle.remote.transport.NCCLTransport.initialize(
    timeout_seconds: int = 120
) -> bool
```

Establish the NCCL group via TCP rendezvous; blocks until both peers connect.

Returns True on success, False on any failure (caller falls back to wire).

```python
nemo_automodel.components.speculative.eagle.remote.transport.NCCLTransport.recv_tensors(
    metadata: dict[str, typing.Optional[dict]],
    keys_order: list[str]
) -> dict[str, typing.Optional[torch.Tensor]]
```

Receive tensors (client side) per `metadata` in `keys_order`.

```python
nemo_automodel.components.speculative.eagle.remote.transport.NCCLTransport.send_tensors(
    tensor_dict: dict[str, typing.Optional[torch.Tensor]],
    keys_order: list[str]
) -> None
```

Send tensors (server side) in `keys_order`; skips `None` entries.

```python
nemo_automodel.components.speculative.eagle.remote.transport._ELEMENT_SIZE = {torch.int16: 2, torch.int8: 1, torch.bool: 1}
```

```python
nemo_automodel.components.speculative.eagle.remote.transport._NCCL_UNSUPPORTED_DTYPES = {torch.int16, torch.int8, torch.bool}
```

```python
nemo_automodel.components.speculative.eagle.remote.transport.logger = logging.getLogger(__name__)
```