core.resharding.nvshmem_copy_service.planning.gpu_execution_planner#
GPU execution planning for pack/unpack operations.
Converts high-level task descriptions into GPU-ready metadata (pointer arrays, sizes, chunking) for kernel execution.
Module Contents#
Classes#
Plans GPU kernel execution by building pointer arrays and metadata. |
API#
- class core.resharding.nvshmem_copy_service.planning.gpu_execution_planner.GPUExecutionPlanner#
Plans GPU kernel execution by building pointer arrays and metadata.
Initialization
- create_gpu_plans(
- iter_schedules: List[Dict[str, Optional[core.resharding.nvshmem_copy_service.nvshmem_types.ScheduledBatch]]],
- send_slots: List,
- recv_slots: List,
- receive_requests: List[core.resharding.nvshmem_copy_service.nvshmem_types.ReceiveRequest],
Build GPU execution plans for all iterations.
Modifies iter_schedules in-place by adding gpu_plan to each batch.
- Parameters:
iter_schedules – List of iteration schedules (dicts with ‘send’ and ‘recv’)
send_slots – List of send buffer slots
recv_slots – List of receive buffer slots
receive_requests – List of all receive requests for matching
- _plan_kernel_args(
- ptrs: List[int],
- positions: List[int],
- sizes: List[int],
- is_pack: bool,
- buffer_base: int,
Generate GPU-ready pointer arrays for kernel execution.
Applies 128KB chunking to break large transfers into smaller pieces.
- Parameters:
ptrs – List of tensor data pointers
positions – List of positions within tensors
sizes – List of transfer sizes
is_pack – True for pack (user->buffer), False for unpack (buffer->user)
buffer_base – Base pointer of the buffer
- Returns:
Tuple of (cp_src_addrs, cp_dst_addrs, cp_sizes, num_chunks) as CuPy arrays, or None if no work.