nemo_automodel.components.speculative.eagle.remote.wire

View as Markdown

Compact binary tensor serialization for the remote target data plane.

This is the fallback path used when NCCL GPU-to-GPU transfer is unavailable: tensors are encoded as dtype + shape + raw contiguous bytes and shipped inside the HTTP body. The format is little-endian and self-delimiting.

Format::

[4B] magic 0x4E4D4554 (“NMET”) per entry: [4B] key_len (uint32) [key_len B] key UTF-8 [1B] flags bit0 = is_none if not none: [1B] dtype_code (see _DTYPE_TABLE) [1B] ndim [ndim x 8B] shape (int64) [8B] nbytes (uint64) [nbytes B] data raw contiguous tensor bytes

Module Contents

Functions

NameDescription
decodeDecode a wire-format blob back into a dict of tensors on map_location.
encodeEncode a dict of CPU tensors into the wire format.
encode_to_bytesEncode and return immutable bytes (HTTP body).

Data

MAGIC

_DTYPE_FMT

_DTYPE_TABLE

_DTYPE_TO_CODE

_FLAG_FMT

_FLAG_NONE

_HEADER_FMT

_KEYLEN_FMT

_NBYTES_FMT

_NDIM_FMT

_SHAPE_FMT

API

nemo_automodel.components.speculative.eagle.remote.wire.decode(
raw: bytes,
map_location: str = 'cpu'
) -> dict[str, typing.Optional[torch.Tensor]]

Decode a wire-format blob back into a dict of tensors on map_location.

nemo_automodel.components.speculative.eagle.remote.wire.encode(
tensor_dict: dict[str, typing.Optional[torch.Tensor]]
) -> bytearray

Encode a dict of CPU tensors into the wire format.

None values are preserved. The caller is responsible for moving tensors to CPU first; CUDA tensors are rejected to keep the data path explicit.

nemo_automodel.components.speculative.eagle.remote.wire.encode_to_bytes(
tensor_dict: dict[str, typing.Optional[torch.Tensor]]
) -> bytes

Encode and return immutable bytes (HTTP body).

nemo_automodel.components.speculative.eagle.remote.wire.MAGIC = 1313686868
nemo_automodel.components.speculative.eagle.remote.wire._DTYPE_FMT = struct.Struct('<B')
nemo_automodel.components.speculative.eagle.remote.wire._DTYPE_TABLE: dict[int, dtype] = {0: torch.float32, 1: torch.float64, 2: torch.float16, 3: torch.bfloat16, 4: tor...
nemo_automodel.components.speculative.eagle.remote.wire._DTYPE_TO_CODE: dict[dtype, int] = {dt: c for c, dt in (_DTYPE_TABLE.items())}
nemo_automodel.components.speculative.eagle.remote.wire._FLAG_FMT = struct.Struct('<B')
nemo_automodel.components.speculative.eagle.remote.wire._FLAG_NONE = 1
nemo_automodel.components.speculative.eagle.remote.wire._HEADER_FMT = struct.Struct('<I')
nemo_automodel.components.speculative.eagle.remote.wire._KEYLEN_FMT = struct.Struct('<I')
nemo_automodel.components.speculative.eagle.remote.wire._NBYTES_FMT = struct.Struct('<Q')
nemo_automodel.components.speculative.eagle.remote.wire._NDIM_FMT = struct.Struct('<B')
nemo_automodel.components.speculative.eagle.remote.wire._SHAPE_FMT = struct.Struct('<q')