core.resharding.copy_services.nvshmem_copy_service#
Module Contents#
Classes#
CopyService implementation backed by NVSHMEM RemoteCopyService. |
Data#
API#
- core.resharding.copy_services.nvshmem_copy_service.logger#
‘getLogger(…)’
- class core.resharding.copy_services.nvshmem_copy_service.NVSHMEMCopyService#
Bases:
core.resharding.copy_services.base.CopyServiceCopyService implementation backed by NVSHMEM RemoteCopyService.
Initialization
- _ensure_initialized()#
- submit_send(src_tensor: torch.Tensor, dest_rank: int)#
Basic CopyService API is not rich enough to drive the NVSHMEM planner (it lacks a globally shared task identifier), so this method is kept only for interface compatibility and should not be used directly.
The resharding path calls into NVSHMEMCopyService via the submit_send_with_id/submit_recv_with_id helpers instead.
- submit_recv(dest_tensor: torch.Tensor, src_rank: int)#
- submit_send_with_id(
- task_id: int,
- src_tensor: torch.Tensor,
- dest_rank: int,
Register a send with an explicit, globally shared task_id.
- submit_recv_with_id(
- task_id: int,
- dest_tensor: torch.Tensor,
- src_rank: int,
Register a recv with an explicit, globally shared task_id.
- run()#
Execute all registered transfer pairs via NVSHMEM.
This converts the registered pairs into RemoteCopyService send/receive requests, builds a schedule, runs the pipelined NVSHMEM transfer, and then clears internal state.