core.resharding.nvshmem_copy_service.planning.gpu_execution_planner#

GPU execution planning for pack/unpack operations.

Converts high-level task descriptions into GPU-ready metadata (pointer arrays, sizes, chunking) for kernel execution.

Module Contents#

Classes#

GPUExecutionPlanner

Plans GPU kernel execution by building pointer arrays and metadata.

API#

class core.resharding.nvshmem_copy_service.planning.gpu_execution_planner.GPUExecutionPlanner#

Plans GPU kernel execution by building pointer arrays and metadata.

Initialization

create_gpu_plans(
iter_schedules: List[Dict[str, Optional[core.resharding.nvshmem_copy_service.nvshmem_types.ScheduledBatch]]],
send_slots: List,
recv_slots: List,
receive_requests: List[core.resharding.nvshmem_copy_service.nvshmem_types.ReceiveRequest],
) None#

Build GPU execution plans for all iterations.

Modifies iter_schedules in-place by adding gpu_plan to each batch.

Parameters:
  • iter_schedules – List of iteration schedules (dicts with ‘send’ and ‘recv’)

  • send_slots – List of send buffer slots

  • recv_slots – List of receive buffer slots

  • receive_requests – List of all receive requests for matching

_plan_kernel_args(
ptrs: List[int],
positions: List[int],
sizes: List[int],
is_pack: bool,
buffer_base: int,
) Optional[Tuple[object, object, object, int]]#

Generate GPU-ready pointer arrays for kernel execution.

Applies 128KB chunking to break large transfers into smaller pieces.

Parameters:
  • ptrs – List of tensor data pointers

  • positions – List of positions within tensors

  • sizes – List of transfer sizes

  • is_pack – True for pack (user->buffer), False for unpack (buffer->user)

  • buffer_base – Base pointer of the buffer

Returns:

Tuple of (cp_src_addrs, cp_dst_addrs, cp_sizes, num_chunks) as CuPy arrays, or None if no work.