core.resharding.copy_services.base#
Module Contents#
Classes#
Single send operation pending in a CopyService queue. |
|
Single receive operation pending in a CopyService queue. |
|
Abstract interface for submitting and executing batched P2P copy operations. |
Functions#
Pair same-rank send/recv ops by task_id, raising on any mismatch. |
API#
- class core.resharding.copy_services.base.SendOp#
Single send operation pending in a CopyService queue.
- task_id: int | None#
None
- tensor: torch.Tensor#
None
- dest_rank: int#
None
- class core.resharding.copy_services.base.RecvOp#
Single receive operation pending in a CopyService queue.
- task_id: int | None#
None
- tensor: torch.Tensor#
None
- src_rank: int#
None
- class core.resharding.copy_services.base.CopyService(group=None)#
Bases:
abc.ABCAbstract interface for submitting and executing batched P2P copy operations.
All backends accept an optional task_id on submit calls. The task_id is a globally unique identifier shared between the matching send and recv for the same transfer. It is required for local (same-rank) copy matching and for the NVSHMEM backend’s scheduling. Backends that do not need it for remote transfers simply ignore it.
Initialization
- abstractmethod submit_send(
- src_tensor: torch.Tensor,
- dest_rank: int,
- task_id: Optional[int] = None,
Register a tensor send from the current rank to
dest_rank.
- abstractmethod submit_recv(
- dest_tensor: torch.Tensor,
- src_rank: int,
- task_id: Optional[int] = None,
Register a tensor receive into
dest_tensorfromsrc_rank.
- abstractmethod run()#
Execute all previously submitted send/recv operations as a single batch.
- close() None#
Release backend-owned resources. Default no-op; NVSHMEM overrides.
- core.resharding.copy_services.base.match_local_ops_by_task_id(
- local_sends: list,
- local_recvs: list,
- backend_name: str,
- rank: int,
Pair same-rank send/recv ops by task_id, raising on any mismatch.
Returns a list of
(send_op, recv_op)tuples for the caller to apply backend-specific local-copy logic. Either op type may be a backend-local wrapper as long as it exposes.task_id.