> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.eagle.remote.client

Training-side client for the remote EAGLE-3 target server.

:class:`RemoteEagle3TargetModel` implements the `Eagle3TargetBackend` contract
by delegating `generate_batch` to one or more remote target servers. It POSTs
`input_ids` over HTTP and receives the supervision tensors either over NCCL
(GPU-direct, body carries only metadata) or as a binary wire blob (fallback).

Multiple server URLs are dispatched round-robin so the prefetch pipeline in the
training loop can keep several requests in flight (one per server) and overlap
target inference with draft training.

## Module Contents

### Classes

| Name                                                                                                            | Description                                                             |
| --------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| [`RemoteEagle3TargetModel`](#nemo_automodel-components-speculative-eagle-remote-client-RemoteEagle3TargetModel) | EAGLE-3 target backend that delegates forward passes to remote servers. |
| [`_AsyncHandle`](#nemo_automodel-components-speculative-eagle-remote-client-_AsyncHandle)                       | Future-like wrapper that converts a worker-thread result into a batch.  |
| [`_ServerClient`](#nemo_automodel-components-speculative-eagle-remote-client-_ServerClient)                     | HTTP + NCCL connection to a single remote target server.                |

### Data

[`logger`](#nemo_automodel-components-speculative-eagle-remote-client-logger)

### API

```python
class nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel(
    urls: list[str],
    device: torch.device,
    timeout: int = 120,
    max_retries: int = 3
)
```

**Bases:** [Eagle3TargetBackend](/nemo-automodel/nemo_automodel/components/speculative/eagle/backend#nemo_automodel-components-speculative-eagle-backend-Eagle3TargetBackend)

EAGLE-3 target backend that delegates forward passes to remote servers.

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel._build_payload(
    input_ids,
    attention_mask,
    loss_mask
) -> bytes
```

staticmethod

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel._to_batch(
    result: dict,
    attention_mask: torch.Tensor
) -> nemo_automodel.components.speculative.eagle.target.Eagle3TargetBatch
```

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.close() -> None
```

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.from_urls(
    urls: list[str],
    device,
    kwargs = {}
) -> 'RemoteEagle3TargetModel'
```

classmethod

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.generate_batch(
    input_ids,
    attention_mask,
    loss_mask
) -> nemo_automodel.components.speculative.eagle.target.Eagle3TargetBatch
```

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.generate_batch_async(
    input_ids,
    attention_mask,
    loss_mask
) -> nemo_automodel.components.speculative.eagle.remote.client._AsyncHandle
```

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.get_input_embeddings()
```

Fetch the target input-embedding weight once and cache it.

Returns an object exposing `.weight` (the only attribute the draft's
`copy_embeddings_from_target` reads), matching the offline-cache path.

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.model_info() -> dict
```

```python
nemo_automodel.components.speculative.eagle.remote.client.RemoteEagle3TargetModel.set_vocab_mapping(
    selected_token_ids: torch.Tensor,
    selected_token_mask: torch.Tensor
) -> None
```

```python
class nemo_automodel.components.speculative.eagle.remote.client._AsyncHandle(
    future,
    convert
)
```

Future-like wrapper that converts a worker-thread result into a batch.

```python
nemo_automodel.components.speculative.eagle.remote.client._AsyncHandle.cancel() -> bool
```

```python
nemo_automodel.components.speculative.eagle.remote.client._AsyncHandle.result(
    timeout: typing.Optional[float] = None
) -> nemo_automodel.components.speculative.eagle.target.Eagle3TargetBatch
```

```python
class nemo_automodel.components.speculative.eagle.remote.client._ServerClient(
    url: str,
    timeout: int,
    max_retries: int,
    nccl_rank_offset: int = 0
)
```

HTTP + NCCL connection to a single remote target server.

```python
nemo_automodel.components.speculative.eagle.remote.client._ServerClient._host() -> str
```

```python
nemo_automodel.components.speculative.eagle.remote.client._ServerClient._init_nccl() -> bool
```

```python
nemo_automodel.components.speculative.eagle.remote.client._ServerClient._nccl_port() -> int
```

```python
nemo_automodel.components.speculative.eagle.remote.client._ServerClient.close() -> None
```

```python
nemo_automodel.components.speculative.eagle.remote.client._ServerClient.generate(
    payload: bytes
) -> dict[str, typing.Optional[torch.Tensor]]
```

POST /generate and return the supervision tensors (NCCL or wire).

`/generate` is the per-step hot path, so a transient timeout / connection
reset here would otherwise abort a long remote-training run. The wire path
is an idempotent HTTP round-trip, so it reuses :meth:`request`'s
exponential-backoff retry. The NCCL path is deliberately a single attempt:
the POST triggers a server-side NCCL send paired with the `recv_tensors`
below, so a blind retry would issue a second send and desync the 2-process
data-plane group (the client's one recv vs the server's two sends would
hang). Recovering the NCCL path needs a transport resync (tear down +
re-init, or fall back to wire) and is tracked separately.

Serialized on `_generate_lock` so this process never has two
/generate requests in flight against the same server: the NCCL recv
posted here must pair with this request's send, and the server's
hook-based aux capture is not reentrant.

```python
nemo_automodel.components.speculative.eagle.remote.client._ServerClient.request(
    endpoint: str,
    payload: bytes,
    content_type: str = 'application/octet-stream'
) -> bytes
```

POST `payload` to `endpoint` with exponential-backoff retry.

```python
nemo_automodel.components.speculative.eagle.remote.client.logger = logging.getLogger(__name__)
```