core.inference.symmetric_memory#

Lazy-initialized symmetric memory manager for inference.

Provides a registry of SymmetricMemoryBuffer instances keyed by a user-supplied identifier (e.g. “tp”, “ep”). Buffers are created on first access so that callers never need to worry about initialization ordering relative to the inference context.

Module Contents#

Classes#

SymmetricMemoryBuffer

symmetric memory buffer used in inference. This buffer is used by mcore-inference’s low-latency NVLS all-gather and reduce-scatter collectives.

SymmetricMemoryManager

Registry of lazily-initialized symmetric memory buffers.

API#

class core.inference.symmetric_memory.SymmetricMemoryBuffer(size_in_mb, process_group)#

symmetric memory buffer used in inference. This buffer is used by mcore-inference’s low-latency NVLS all-gather and reduce-scatter collectives.

Initialization

_can_allocate(numel, dtype) bool#

Returns whether enough symmetric memory is available for the given tensor shape and dtype.

_allocate(numel, dtype) torch.Tensor#

Allocates a sub-tensor from the self.symm_buffer for the given numel and dtype

maybe_get_tensors(tensor_specs, alignment=16)#

Pack multiple tensors contiguously in the symmetric buffer with alignment.

Each tensor’s starting offset is aligned to alignment bytes (default 16 for 128-bit multimem access).

Parameters:
  • tensor_specs – list of (numel, dtype) tuples.

  • alignment – byte alignment for each tensor’s start offset (default 16).

Returns:

None, “tensors”: None} if unavailable or insufficient space. {“handle”: symm_mem_hdl, “tensors”: [(raw_byte_view, byte_offset), …]} on success, where raw_byte_view is a uint8 slice of the buffer.

Return type:

{“handle”

maybe_get_tensor(tensor_shape, dtype)#

Returns (potentially) a sub-tensor from the self.symm_buffer for the given shape. If enough symmetric memory is not available, returns None.

class core.inference.symmetric_memory.SymmetricMemoryManager#

Registry of lazily-initialized symmetric memory buffers.

Usage::

buf = SymmetricMemoryManager.get_buffer("tp", process_group=tp_group)
result = buf.maybe_get_tensor(shape, dtype)
_buffers: dict[str, core.inference.symmetric_memory.SymmetricMemoryBuffer]#

None

_default_size_mb: int#

256

classmethod get_buffer(
key: str,
process_group: Optional[torch.distributed.ProcessGroup] = None,
size_mb: Optional[int] = None,
) core.inference.symmetric_memory.SymmetricMemoryBuffer#

Return the buffer for key, creating it on first call.

Parameters:
  • key – Unique identifier (e.g. “tp”, “ep”).

  • process_group – Required on the first call for a given key. Subsequent calls may omit it.

  • size_mb – Buffer size in MiB (default 256).

classmethod destroy(key: Optional[str] = None) None#

Destroy one or all buffers.

Parameters:

key – If provided, destroy only that buffer. Otherwise destroy all.

classmethod is_initialized(key: str) bool#

Check whether a buffer has been created for key.