core.resharding.nvshmem_copy_service.core.kernel_launcher#
CUDA kernel management and launching for pack/unpack operations.
Handles kernel compilation, launching, and stream coordination.
Module Contents#
Classes#
Manages CUDA kernel loading and launching for data pack/unpack operations. |
API#
- class core.resharding.nvshmem_copy_service.core.kernel_launcher.KernelLauncher#
Manages CUDA kernel loading and launching for data pack/unpack operations.
Initialization
- load_kernels() None#
Load and compile CUDA kernels from source.
- set_streams(pack_stream, unpack_stream) None#
Cache CuPy stream wrappers for kernel launching.
This eliminates per-launch overhead of stream pointer extraction and CuPy ExternalStream creation.
- Parameters:
pack_stream – CUDA stream for pack operations
unpack_stream – CUDA stream for unpack operations
- launch_pack(
- gpu_plan: Tuple[Any, Any, Any, int],
- pack_stream,
- torch_pack_stream: torch.cuda.ExternalStream,
- pack_event: torch.cuda.Event,
Launch pack kernel to copy data from user tensors to send buffer.
- Parameters:
gpu_plan – Tuple of (cp_src_addrs, cp_dst_addrs, cp_sizes, num_chunks) as CuPy arrays
pack_stream – CUDA stream (cuda.core.experimental.Stream) - unused, kept for compatibility
torch_pack_stream – PyTorch external stream wrapper
pack_event – CUDA event to record after kernel launch
- launch_unpack(
- gpu_plan: Tuple[Any, Any, Any, int],
- unpack_stream,
- torch_unpack_stream: torch.cuda.ExternalStream,
- unpack_event: torch.cuda.Event,
Launch unpack kernel to copy data from receive buffer to user tensors.
- Parameters:
gpu_plan – Tuple of (cp_src_addrs, cp_dst_addrs, cp_sizes, num_chunks) as CuPy arrays
unpack_stream – CUDA stream (cuda.core.experimental.Stream) - unused,
compatibility (kept for)
torch_unpack_stream – PyTorch external stream wrapper
unpack_event – CUDA event to record after kernel launch