core.resharding.copy_services.nvshmem_copy_service#
Module Contents#
Classes#
CopyService implementation backed by NVSHMEM RemoteCopyService. |
Data#
API#
- core.resharding.copy_services.nvshmem_copy_service.logger#
‘getLogger(…)’
- class core.resharding.copy_services.nvshmem_copy_service.NVSHMEMCopyService(group=None)#
Bases:
core.resharding.copy_services.base.CopyServiceCopyService implementation backed by NVSHMEM RemoteCopyService.
NVSHMEM requires a globally unique task_id for every transfer so that the planner can schedule send/recv pairs. Calls without a task_id will raise.
Initialization
- _ensure_initialized()#
- submit_send(
- src_tensor: torch.Tensor,
- dest_rank: int,
- task_id: Optional[int] = None,
- submit_recv(
- dest_tensor: torch.Tensor,
- src_rank: int,
- task_id: Optional[int] = None,
- run()#
Execute all registered transfer pairs via NVSHMEM.
This converts the registered pairs into RemoteCopyService send/receive requests, builds a schedule, runs the pipelined NVSHMEM transfer, and then clears internal state.