core.resharding.copy_services.nccl_copy_service#

Module Contents#

Classes#

SendOp

Simple container describing a single send operation.

RecvOp

Simple container describing a single receive operation.

NCCLCopyService

Thin wrapper around torch.distributed batch_isend_irecv to submit and execute a batch of point-to-point sends and recvs.

Data#

API#

core.resharding.copy_services.nccl_copy_service.logger#

‘getLogger(…)’

class core.resharding.copy_services.nccl_copy_service.SendOp#

Simple container describing a single send operation.

task_id: int | None#

None

tensor: torch.Tensor#

None

dest_rank: int#

None

class core.resharding.copy_services.nccl_copy_service.RecvOp#

Simple container describing a single receive operation.

task_id: int | None#

None

tensor: torch.Tensor#

None

src_rank: int#

None

class core.resharding.copy_services.nccl_copy_service.NCCLCopyService(group=None)#

Bases: core.resharding.copy_services.base.CopyService

Thin wrapper around torch.distributed batch_isend_irecv to submit and execute a batch of point-to-point sends and recvs.

Initialization

submit_send(
src_tensor: torch.Tensor,
dest_rank: int,
task_id: Optional[int] = None,
)#
submit_recv(
dest_tensor: torch.Tensor,
src_rank: int,
task_id: Optional[int] = None,
)#
run()#