core.resharding.nvshmem_copy_service.service#
Remote Copy Service - Main orchestrator for NVSHMEM-based GPU-to-GPU transfers.
This service coordinates task segmentation, workload packing, scheduling,
GPU resource management, and pipelined execution.
Module Contents#
Classes#
Main service for managing remote GPU-to-GPU data transfers. |
API#
- class core.resharding.nvshmem_copy_service.service.RemoteCopyService#
Main service for managing remote GPU-to-GPU data transfers.
Provides high-level API for registering transfers, scheduling, and executing pipelined communication with NVSHMEM.
Initialization
- property my_pe: int#
Get this PE’s rank.
- property n_pes: int#
Get total number of PEs.
- property device#
Get CUDA device.
- property initialized: bool#
Check if service is initialized.
- init(log_level: str = 'INFO') None#
Initialize the service.
Sets up NVSHMEM, CUDA device, streams, buffers, and kernels. Expects to be launched with torchrun.
- Parameters:
log_level – Logging level (TRACE, DEBUG, INFO, WARN, ERROR)
- register_send(
- task_id: int,
- src_tensor,
- src_pos: int,
- size: int,
- dest_pe: int,
Register a send operation.
- Parameters:
task_id – Unique task identifier
src_tensor – Source tensor (PyTorch/CuPy tensor or pointer)
src_pos – Starting position in source tensor
size – Number of bytes to send
dest_pe – Destination PE rank
- register_receive(
- task_id: int,
- dest_tensor,
- dest_pos: int,
- size: int,
- src_pe: int,
Register a receive operation.
- Parameters:
task_id – Unique task identifier
dest_tensor – Destination tensor (PyTorch/CuPy tensor or pointer)
dest_pos – Starting position in destination tensor
size – Number of bytes to receive
src_pe – Source PE rank
- schedule() None#
Build execution schedule.
Can be called once and followed by multiple run() calls for repeated execution with the same communication pattern.
Steps:
Segment large tasks into manageable chunks
Pack tasks into batches
Schedule batches to iterations (conflict-free)
Build GPU execution plans (pointer arrays, chunking)
Create synchronization events
- run() None#
Execute the scheduled communication.
Can be called multiple times after a single schedule() call to repeat the same communication pattern.
- clear_requests() None#
Clear registered requests and schedule.
Call this before registering a new set of transfers.
- finalize() None#
Cleanup resources.
- _segment_tasks() None#
Segment tasks into manageable chunks.
- _prepare_iter_schedules(
- schedule_batches: Dict[int, List[core.resharding.nvshmem_copy_service.nvshmem_types.ScheduledBatch]],
- workloads: Dict[int, List],
- global_summaries: Dict[Tuple[int, int, int], core.resharding.nvshmem_copy_service.nvshmem_types.WorkloadSummary],
- num_iterations: int,
Organize schedule into iteration-based structure.
- Returns:
List of dicts with ‘send’ and ‘recv’ keys for each iteration