core.resharding.copy_services.nccl_copy_service#

Module Contents#

Classes#

SendOp

Simple container describing a single NCCL send operation.

RecvOp

Simple container describing a single NCCL receive operation.

NCCLCopyService

Thin wrapper around torch.distributed batch_isend_irecv to submit and execute a batch of point-to-point sends and recvs.

Data#

API#

core.resharding.copy_services.nccl_copy_service.logger#

‘getLogger(…)’

class core.resharding.copy_services.nccl_copy_service.SendOp#

Simple container describing a single NCCL send operation.

task_id: int | None#

None

tensor: torch.Tensor#

None

dest_rank: int#

None

class core.resharding.copy_services.nccl_copy_service.RecvOp#

Simple container describing a single NCCL receive operation.

task_id: int | None#

None

tensor: torch.Tensor#

None

src_rank: int#

None

class core.resharding.copy_services.nccl_copy_service.NCCLCopyService#

Bases: core.resharding.copy_services.base.CopyService

Thin wrapper around torch.distributed batch_isend_irecv to submit and execute a batch of point-to-point sends and recvs.

Initialization

submit_send(src_tensor: torch.Tensor, dest_rank: int)#
submit_send_with_id(
task_id: int,
src_tensor: torch.Tensor,
dest_rank: int,
)#

Submit a send operation with a unique task identifier.

submit_recv(dest_tensor: torch.Tensor, src_rank: int)#

Submit a receive operation.

submit_recv_with_id(
task_id: int,
dest_tensor: torch.Tensor,
src_rank: int,
)#

Submit a receive operation with a unique task identifier.

run()#