core.resharding.copy_services.nvshmem_copy_service#

Module Contents#

Classes#

NVSHMEMCopyService

CopyService implementation backed by NVSHMEM RemoteCopyService.

Data#

API#

core.resharding.copy_services.nvshmem_copy_service.logger#

‘getLogger(…)’

class core.resharding.copy_services.nvshmem_copy_service.NVSHMEMCopyService(group=None)#

Bases: core.resharding.copy_services.base.CopyService

CopyService implementation backed by NVSHMEM RemoteCopyService.

NVSHMEM requires a globally unique task_id for every transfer so that the planner can schedule send/recv pairs. Calls without a task_id will raise.

Initialization

_ensure_initialized()#
submit_send(
src_tensor: torch.Tensor,
dest_rank: int,
task_id: Optional[int] = None,
)#
submit_recv(
dest_tensor: torch.Tensor,
src_rank: int,
task_id: Optional[int] = None,
)#
run()#

Execute all registered transfer pairs via NVSHMEM.

This converts the registered pairs into RemoteCopyService send/receive requests, builds a schedule, runs the pipelined NVSHMEM transfer, and then clears internal state.