NVIDIA OpenSHMEM Library (NVSHMEM) Documentation¶
NVSHMEM implements the OpenSHMEM parallel programming model for clusters of NVIDIA GPUs. The NVSHMEM Partitioned Global Address Space spans memory across GPUs and includes an API for fine-grained GPU-GPU data movement from within a CUDA kernel, on CUDA streams, and from the CPU.
Contents:
- Programming Model Overview
- Memory Model
- Execution Model
- Library Constants
- Library Handles
- Environment Variables
- NVSHMEM API
- Library Setup, Exit, and Query
- Thread Support
- Kernel Launch Routines
- Memory Management
- Remote Memory Access
- Atomic Memory Operations
- NVSHMEM_ATOMIC_FETCH
- NVSHMEM_ATOMIC_SET
- NVSHMEM_ATOMIC_COMPARE_SWAP
- NVSHMEM_ATOMIC_SWAP
- NVSHMEM_ATOMIC_FETCH_INC
- NVSHMEM_ATOMIC_INC
- NVSHMEM_ATOMIC_FETCH_ADD
- NVSHMEM_ATOMIC_ADD
- NVSHMEM_ATOMIC_FETCH_AND
- NVSHMEM_ATOMIC_AND
- NVSHMEM_ATOMIC_FETCH_OR
- NVSHMEM_ATOMIC_OR
- NVSHMEM_ATOMIC_FETCH_XOR
- NVSHMEM_ATOMIC_XOR
- Signaling Operations
- Collective Communication
- Point-To-Point Synchronization
- NVSHMEM_WAIT_UNTIL
- NVSHMEM_WAIT_UNTIL_ALL
- NVSHMEM_WAIT_UNTIL_ANY
- NVSHMEM_WAIT_UNTIL_SOME
- NVSHMEM_WAIT_UNTIL_ALL_VECTOR
- NVSHMEM_WAIT_UNTIL_ANY_VECTOR
- NVSHMEM_WAIT_UNTIL_SOME_VECTOR
- NVSHMEM_TEST
- NVSHMEM_TEST_ALL
- NVSHMEM_TEST_ANY
- NVSHMEM_TEST_SOME
- NVSHMEM_TEST_ALL_VECTOR
- NVSHMEM_TEST_ANY_VECTOR
- NVSHMEM_TEST_SOME_VECTOR
- Memory Ordering
- NVSHMEM SLA
- Acknowledgements