core.resharding.copy_services.base#

Module Contents#

Classes#

CopyService

Abstract interface for submitting and executing batched P2P copy operations.

API#

class core.resharding.copy_services.base.CopyService#

Bases: abc.ABC

Abstract interface for submitting and executing batched P2P copy operations.

All backends accept an optional task_id on submit calls. The task_id is a globally unique identifier shared between the matching send and recv for the same transfer. It is required for local (same-rank) copy matching and for the NVSHMEM backend’s scheduling. Backends that do not need it for remote transfers simply ignore it.

abstractmethod submit_send(
src_tensor: torch.Tensor,
dest_rank: int,
task_id: Optional[int] = None,
)#

Register a tensor send from the current rank to dest_rank.

abstractmethod submit_recv(
dest_tensor: torch.Tensor,
src_rank: int,
task_id: Optional[int] = None,
)#

Register a tensor receive into dest_tensor from src_rank.

abstractmethod run()#

Execute all previously submitted send/recv operations as a single batch.