`core.inference.symmetric_memory`#

Lazy-initialized symmetric memory manager for inference.

Provides a registry of SymmetricMemoryBuffer instances keyed by a user-supplied identifier (e.g. “tp”, “ep”). Buffers are created on first access so that callers never need to worry about initialization ordering relative to the inference context.

Module Contents#

Classes#

`SymmetricMemoryBuffer`	symmetric memory buffer used in inference. This buffer is used by mcore-inference’s low-latency NVLS all-gather and reduce-scatter collectives.
`SymmetricMemoryManager`	Registry of lazily-initialized symmetric memory buffers.

API#

class core.inference.symmetric_memory.SymmetricMemoryBuffer(size_in_mb, process_group)#

symmetric memory buffer used in inference. This buffer is used by mcore-inference’s low-latency NVLS all-gather and reduce-scatter collectives.

Initialization

_can_allocate(numel, dtype) → bool#: Returns whether enough symmetric memory is available for the given tensor shape and dtype.

_allocate(numel, dtype) → torch.Tensor#: Allocates a sub-tensor from the self.symm_buffer for the given numel and dtype

maybe_get_tensors(tensor_specs, alignment=16)#

Pack multiple tensors contiguously in the symmetric buffer with alignment.

Each tensor’s starting offset is aligned to alignment bytes (default 16 for 128-bit multimem access).

Parameters:

tensor_specs – list of (numel, dtype) tuples.
alignment – byte alignment for each tensor’s start offset (default 16).

Returns:

None, “tensors”: None} if unavailable or insufficient space. {“handle”: symm_mem_hdl, “tensors”: [(raw_byte_view, byte_offset), …]} on success, where raw_byte_view is a uint8 slice of the buffer.

Return type:

{“handle”

maybe_get_tensor(tensor_shape, dtype)#: Returns (potentially) a sub-tensor from the self.symm_buffer for the given shape. If enough symmetric memory is not available, returns None.

class core.inference.symmetric_memory.SymmetricMemoryManager#

Registry of lazily-initialized symmetric memory buffers.

Usage::

buf = SymmetricMemoryManager.get_buffer("tp", process_group=tp_group)
result = buf.maybe_get_tensor(shape, dtype)

_buffers: dict[str, core.inference.symmetric_memory.SymmetricMemoryBuffer]#: None

_default_size_mb: int#: 512

classmethod get_buffer( key: str, process_group: Optional[torch.distributed.ProcessGroup] = None, size_mb: Optional[int] = None, ) → core.inference.symmetric_memory.SymmetricMemoryBuffer#

Return the buffer for key, creating it on first call.

Parameters:

key – Unique identifier (e.g. “tp”, “ep”).
process_group – Required on the first call for a given key. Subsequent calls may omit it.
size_mb – Buffer size in MiB (default 256).

classmethod destroy(key: Optional[str] = None) → None#

Destroy one or all buffers.

Parameters:: key – If provided, destroy only that buffer. Otherwise destroy all.

classmethod is_initialized(key: str) → bool#: Check whether a buffer has been created for key.

core.inference.symmetric_memory#

Module Contents#

Classes#

API#

`core.inference.symmetric_memory`#