Device API
The Device API allows communication to be initiated and performed from device (GPU) code. It is organized into the following areas:
Host-Side Setup — Creating and configuring device communicators, querying properties, host-accessible device pointer functions, and related types.
Memory and LSA — Load/store accessible (LSA) memory, barriers, pointer accessors, and multimem.
GIN (GPU-Initiated Networking) — One-sided transfers, signals, counters, and network barriers.
Reduce, Broadcast, and Fused Building Blocks — Building blocks for computation-fused kernels: reduce, copy (broadcast), and reduce-then-copy; used to implement algorithms such as AllReduce, AllGather, and ReduceScatter.
For an introduction and usage examples, see Device-Initiated Communication.