core.inference.symmetric_memory#
Lazy-initialized symmetric memory manager for inference.
Provides a registry of SymmetricMemoryBuffer instances keyed by a user-supplied identifier (e.g. “tp”, “ep”). Buffers are created on first access so that callers never need to worry about initialization ordering relative to the inference context.
Module Contents#
Classes#
symmetric memory buffer used in inference. This buffer is used by mcore-inference’s low-latency NVLS all-gather and reduce-scatter collectives. |
|
Registry of lazily-initialized symmetric memory buffers. |
API#
- class core.inference.symmetric_memory.SymmetricMemoryBuffer(size_in_mb, process_group)#
symmetric memory buffer used in inference. This buffer is used by mcore-inference’s low-latency NVLS all-gather and reduce-scatter collectives.
Initialization
- _can_allocate(numel, dtype) bool#
Returns whether enough symmetric memory is available for the given tensor shape and dtype.
- _allocate(numel, dtype) torch.Tensor#
Allocates a sub-tensor from the self.symm_buffer for the given numel and dtype
- maybe_get_tensors(tensor_specs, alignment=16)#
Pack multiple tensors contiguously in the symmetric buffer with alignment.
Each tensor’s starting offset is aligned to
alignmentbytes (default 16 for 128-bit multimem access).- Parameters:
tensor_specs – list of (numel, dtype) tuples.
alignment – byte alignment for each tensor’s start offset (default 16).
- Returns:
None, “tensors”: None} if unavailable or insufficient space. {“handle”: symm_mem_hdl, “tensors”: [(raw_byte_view, byte_offset), …]} on success, where raw_byte_view is a uint8 slice of the buffer.
- Return type:
{“handle”
- maybe_get_tensor(tensor_shape, dtype)#
Returns (potentially) a sub-tensor from the self.symm_buffer for the given shape. If enough symmetric memory is not available, returns None.
- class core.inference.symmetric_memory.SymmetricMemoryManager#
Registry of lazily-initialized symmetric memory buffers.
Usage::
buf = SymmetricMemoryManager.get_buffer("tp", process_group=tp_group) result = buf.maybe_get_tensor(shape, dtype)- _buffers: dict[str, core.inference.symmetric_memory.SymmetricMemoryBuffer]#
None
- _default_size_mb: int#
256
- classmethod get_buffer(
- key: str,
- process_group: Optional[torch.distributed.ProcessGroup] = None,
- size_mb: Optional[int] = None,
Return the buffer for key, creating it on first call.
- Parameters:
key – Unique identifier (e.g. “tp”, “ep”).
process_group – Required on the first call for a given key. Subsequent calls may omit it.
size_mb – Buffer size in MiB (default 256).
- classmethod destroy(key: Optional[str] = None) None#
Destroy one or all buffers.
- Parameters:
key – If provided, destroy only that buffer. Otherwise destroy all.
- classmethod is_initialized(key: str) bool#
Check whether a buffer has been created for key.