> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.serve_target

Launch a remote EAGLE-3 target server (train-inference disaggregation).

Loads the frozen target model on this process's GPU and serves draft-vocab
supervision (aux hidden states, `target_probs`, `position_mask`) to a
training client over HTTP (control plane) + NCCL (data plane).

Typical usage (single-GPU server)::

CUDA\_VISIBLE\_DEVICES=0 python -m nemo\_automodel.components.speculative.serve\_target \
\--target meta-llama/Llama-3.1-8B-Instruct \
\--host 0.0.0.0 --port 8001

Then point training at it::

recipe\_args.target\_model\_backend: remote
recipe\_args.remote\_urls: \["http\://\<server-host>:8001"]
recipe\_args.target\_prefetch\_depth: 1

Verify readiness with `curl http://&lt;host&gt;:8001/health`. NCCL GPU-direct
transfer requires sglang installed in the server's environment; without it the
server transparently falls back to the binary wire format.

## Module Contents

### Functions

| Name                                                                             | Description                                                    |
| -------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| [`_parse_args`](#nemo_automodel-components-speculative-serve_target-_parse_args) | -                                                              |
| [`main`](#nemo_automodel-components-speculative-serve_target-main)               | Load the target model and run the blocking HTTP + NCCL server. |

### Data

[`logger`](#nemo_automodel-components-speculative-serve_target-logger)

### API

```python
nemo_automodel.components.speculative.serve_target._parse_args(
    argv = None
) -> argparse.Namespace
```

```python
nemo_automodel.components.speculative.serve_target.main(
    argv = None
) -> None
```

Load the target model and run the blocking HTTP + NCCL server.

```python
nemo_automodel.components.speculative.serve_target.logger = logging.getLogger(__name__)
```