NVIDIA OpenSHMEM Library (NVSHMEM) Documentation¶
NVSHMEM implements the OpenSHMEM parallel programming model for clusters of NVIDIA ® GPUs. The NVSHMEM Partitioned Global Address Space (PGAS) spans the memory across GPUs and includes an API for fine-grained GPU-GPU data movement from within a CUDA kernel, on CUDA streams, and from the CPU.
Contents:
- Introduction
- Using NVSHMEM
- Memory Model
- Execution Model
- Library Constants
- Library Handles
- Environment Variables
- NVSHMEM APIs
- Overview of the APIs
- Library Setup, Exit, and Query
- Thread Support
- Kernel Launch Routines
- Memory Management
- Team Management
- Remote Memory Access
- Atomic Memory Operations
- NVSHMEM_ATOMIC_FETCH
- NVSHMEM_ATOMIC_SET
- NVSHMEM_ATOMIC_COMPARE_SWAP
- NVSHMEM_ATOMIC_SWAP
- NVSHMEM_ATOMIC_FETCH_INC
- NVSHMEM_ATOMIC_INC
- NVSHMEM_ATOMIC_FETCH_ADD
- NVSHMEM_ATOMIC_ADD
- NVSHMEM_ATOMIC_FETCH_AND
- NVSHMEM_ATOMIC_AND
- NVSHMEM_ATOMIC_FETCH_OR
- NVSHMEM_ATOMIC_OR
- NVSHMEM_ATOMIC_FETCH_XOR
- NVSHMEM_ATOMIC_XOR
- Signaling Operations
- Collective Communication
- Point-To-Point Synchronization
- NVSHMEM_WAIT_UNTIL
- NVSHMEM_WAIT_UNTIL_ALL
- NVSHMEM_WAIT_UNTIL_ANY
- NVSHMEM_WAIT_UNTIL_SOME
- NVSHMEM_WAIT_UNTIL_ALL_VECTOR
- NVSHMEM_WAIT_UNTIL_ANY_VECTOR
- NVSHMEM_WAIT_UNTIL_SOME_VECTOR
- NVSHMEM_TEST
- NVSHMEM_TEST_ALL
- NVSHMEM_TEST_ANY
- NVSHMEM_TEST_SOME
- NVSHMEM_TEST_ALL_VECTOR
- NVSHMEM_TEST_ANY_VECTOR
- NVSHMEM_TEST_SOME_VECTOR
- NVSHMEM_SIGNAL_WAIT_UNTIL
- Memory Ordering
- Examples
- Troubleshooting And FAQs
- NVSHMEM SLA
- Acknowledgements