core.resharding.copy_services.nvshmem_copy_service#

Module Contents#

Classes#

NVSHMEMCopyService

CopyService implementation backed by NVSHMEM RemoteCopyService.

Data#

API#

core.resharding.copy_services.nvshmem_copy_service.logger#

‘getLogger(…)’

class core.resharding.copy_services.nvshmem_copy_service.NVSHMEMCopyService#

Bases: core.resharding.copy_services.base.CopyService

CopyService implementation backed by NVSHMEM RemoteCopyService.

Initialization

_ensure_initialized()#
submit_send(src_tensor: torch.Tensor, dest_rank: int)#

Basic CopyService API is not rich enough to drive the NVSHMEM planner (it lacks a globally shared task identifier), so this method is kept only for interface compatibility and should not be used directly.

The resharding path calls into NVSHMEMCopyService via the submit_send_with_id/submit_recv_with_id helpers instead.

submit_recv(dest_tensor: torch.Tensor, src_rank: int)#
submit_send_with_id(
task_id: int,
src_tensor: torch.Tensor,
dest_rank: int,
)#

Register a send with an explicit, globally shared task_id.

submit_recv_with_id(
task_id: int,
dest_tensor: torch.Tensor,
src_rank: int,
)#

Register a recv with an explicit, globally shared task_id.

run()#

Execute all registered transfer pairs via NVSHMEM.

This converts the registered pairs into RemoteCopyService send/receive requests, builds a schedule, runs the pipelined NVSHMEM transfer, and then clears internal state.