NVSHMEM Best Practice Guide# NVSHMEM Best Practices Guide NVSHMEM Initialization Two-Stage Initialization Device APIs Device APIs on Peer-to-Peer Transport Device APIs on Proxy-Based Transport Device APIs on IBGDA Transport On-stream APIs Host APIs CUDA NVSHMEM Interoperability Using CUDA Streams APIs NVSHMEM Runtime Configuration Environment behavior NVSHMEM Unsupported Operations Toolchaining Operations NVSHMEM Performance Using 16-Byte Alignment Buffers in the Application Using nvshmem_*block for Heterogenous Transports Configuration Tuning the queue-pair Type and Configuration for IBGDA