NVSHMEM
2.8.0
Introduction
Key Features
Communication Transports
Advantages Of NVSHMEM
GPU-Initiated Communication And Strong Scaling
Using NVSHMEM
Example NVSHMEM Program
Using the NVSHMEM InfiniBand GPUDirect Async Transport
Using NVSHMEM With MPI or OpenSHMEM
Compiling NVSHMEM Programs
Running NVSHMEM Programs
Communication Model
Data Consistency
Multiprocess GPU Support
Building NVSHMEM Applications/Libraries
NVSHMEM and the CUDA Model
The CUDA Execution Model
Work Submission in CUDA
The CUDA Abstract Machine
Nonlocal Operations and the CUDA Execution Model
CUDA Streams and Circular Dependencies
CUDA Stream Order and Execution Resources
CUDA Streams and False Circular Dependencies
Intra-Kernel Synchronization
Ensuring Safe Nonlocal Operations Using NVSHMEM Cooperative Kernel Launch
Implicitly Asynchronous cudaMemcpy
Memory Model
Pointers to Symmetric Objects
Ordering of Operations
Atomicity Guarantees
Differences Between NVSHMEM and OpenSHMEM
Ordering of Blocking Fetching Operations
Visibility Guarantees
Execution Model
Progress of NVSHMEM Operations
Invoking NVSHMEM Operations
Library Constants
Library Handles
Environment Variables
Standard options
Bootstrap options
Additional options
Collectives options
Transport options
NVTX options
NVSHMEM APIs
Overview of the APIs
Unsupported OpenSHMEM 1.3 APIs
OpenSHMEM 1.3 APIs Not Supported Over InfiniBand
Supported OpenSHMEM APIs (OpenSHMEM 1.4 and 1.5)
NVSHMEM API Extensions For CPU Threads
NVSHMEM API Extensions For GPU Threads
Library Setup, Exit, and Query
NVSHMEM_INIT
NVSHMEMX_INIT_ATTR
NVSHMEMX_CUMODULE_INIT
NVSHMEMX_INIT_STATUS
NVSHMEM_MY_PE
NVSHMEM_N_PES
NVSHMEM_FINALIZE
NVSHMEM_GLOBAL_EXIT
NVSHMEM_PTR
NVSHMEM_INFO_GET_VERSION
NVSHMEM_INFO_GET_NAME
Thread Support
NVSHMEM_INIT_THREAD
NVSHMEM_QUERY_THREAD
Kernel Launch Routines
NVSHMEMX_COLLECTIVE_LAUNCH
NVSHMEMX_COLLECTIVE_LAUNCH_QUERY_GRIDSIZE
Memory Management
NVSHMEM_MALLOC, NVSHMEM_FREE, NVSHMEM_ALIGN
NVSHMEM_CALLOC
Memory Registration
NVSHMEMX_BUFFER_REGISTER
NVSHMEMX_BUFFER_UNREGISTER
NVSHMEMX_BUFFER_UNREGISTER_ALL
Team Management
Predefined and Application-Defined Teams
Team Handles
Thread Safety
Collective Ordering
Team Creation
NVSHMEM_TEAM_MY_PE
NVSHMEM_TEAM_N_PES
NVSHMEM_TEAM_CONFIG_T
NVSHMEM_TEAM_GET_CONFIG
NVSHMEM_TEAM_TRANSLATE_PE
NVSHMEM_TEAM_SPLIT_STRIDED
NVSHMEM_TEAM_SPLIT_2D
NVSHMEM_TEAM_DESTROY
Remote Memory Access
Blocking RMA
NVSHMEM_PUT
NVSHMEM_P
NVSHMEM_IPUT
NVSHMEM_GET
NVSHMEM_G
NVSHMEM_IGET
Nonblocking RMA
NVSHMEM_PUT_NBI
NVSHMEM_GET_NBI
Atomic Memory Operations
NVSHMEM_ATOMIC_FETCH
NVSHMEM_ATOMIC_SET
NVSHMEM_ATOMIC_COMPARE_SWAP
NVSHMEM_ATOMIC_SWAP
NVSHMEM_ATOMIC_FETCH_INC
NVSHMEM_ATOMIC_INC
NVSHMEM_ATOMIC_FETCH_ADD
NVSHMEM_ATOMIC_ADD
NVSHMEM_ATOMIC_FETCH_AND
NVSHMEM_ATOMIC_AND
NVSHMEM_ATOMIC_FETCH_OR
NVSHMEM_ATOMIC_OR
NVSHMEM_ATOMIC_FETCH_XOR
NVSHMEM_ATOMIC_XOR
Signaling Operations
Atomicity Guarantees for Signaling Operations
Available Signal Operators
NVSHMEM_PUT_SIGNAL
NVSHMEM_PUT_SIGNAL_NBI
NVSHMEM_SIGNAL_FETCH
NVSHMEMX_SIGNAL
NVSHMEMX_SIGNAL_OP
Collective Communication
Team-based collectives
Implicit team collectives
Error codes returned from team-based collectives
NVSHMEM_BARRIER_ALL
NVSHMEM_BARRIER
NVSHMEM_SYNC
NVSHMEM_SYNC_ALL
NVSHMEM_ALLTOALL
NVSHMEM_BROADCAST
NVSHMEM_FCOLLECT
NVSHMEM_REDUCTIONS
AND
OR
XOR
MAX
MIN
SUM
PROD
Point-To-Point Synchronization
NVSHMEM_WAIT_UNTIL
NVSHMEM_WAIT_UNTIL_ALL
NVSHMEM_WAIT_UNTIL_ANY
NVSHMEM_WAIT_UNTIL_SOME
NVSHMEM_WAIT_UNTIL_ALL_VECTOR
NVSHMEM_WAIT_UNTIL_ANY_VECTOR
NVSHMEM_WAIT_UNTIL_SOME_VECTOR
NVSHMEM_TEST
NVSHMEM_TEST_ALL
NVSHMEM_TEST_ANY
NVSHMEM_TEST_SOME
NVSHMEM_TEST_ALL_VECTOR
NVSHMEM_TEST_ANY_VECTOR
NVSHMEM_TEST_SOME_VECTOR
NVSHMEM_SIGNAL_WAIT_UNTIL
Memory Ordering
NVSHMEM_FENCE
NVSHMEM_QUIET
Examples
Attribute-Based Initialization Example
Collective Launch Example
On-Stream Example
Threadgroup Example
Put on Block Example
Ring Broadcast Example
Troubleshooting And FAQs
General FAQs
Prerequisite FAQs
Running NVSHMEM Programs FAQs
Interoperability With MPI FAQs
Interoperability With OpenSHMEM FAQs
GPU-GPU Interconnection FAQs
NVSHMEM API Usage FAQs
Debugging FAQs
Miscellaneous FAQs
NVSHMEM SLA
LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS
1. License.
2. Limitations.
3. Ownership.
4. No Warranties.
5. Limitations of Liability.
6. Termination.
7. General.
NVSHMEM SUPPLEMENT TO SOFTWARE LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS
Acknowledgements
Notices
Trademarks
Copyright
NVSHMEM
»
Index
N
N
nvshmem_align (C function)
nvshmem_alltoallmem (C function)
,
[1]
nvshmem_barrier (C function)
,
[1]
nvshmem_barrier_all (C function)
,
[1]
NVSHMEM_BARRIER_DISSEM_KVAL (C variable)
NVSHMEM_BARRIER_TG_DISSEM_KVAL (C variable)
NVSHMEM_BCAST_LL_THRESHOLD (C variable)
NVSHMEM_BOOTSTRAP (C variable)
NVSHMEM_BOOTSTRAP_PLUGIN (C variable)
NVSHMEM_BOOTSTRAP_PMI (C variable)
nvshmem_calloc (C function)
NVSHMEM_CMP_EQ (C variable)
NVSHMEM_CMP_GE (C variable)
NVSHMEM_CMP_GT (C variable)
NVSHMEM_CMP_LE (C variable)
NVSHMEM_CMP_LT (C variable)
NVSHMEM_CMP_NE (C variable)
NVSHMEM_CUDA_LIMIT_STACK_SIZE (C variable)
NVSHMEM_CUMEM_GRANULARITY (C variable)
NVSHMEM_DEBUG (C variable)
NVSHMEM_DEBUG_FILE (C variable)
NVSHMEM_DISABLE_CUDA_VMM (C variable)
NVSHMEM_DISABLE_GDRCOPY (C variable)
NVSHMEM_DISABLE_IB_NATIVE_ATOMICS (C variable)
NVSHMEM_DISABLE_LOCAL_ONLY_PROXY (C variable)
NVSHMEM_DISABLE_NCCL (C variable)
NVSHMEM_DISABLE_P2P (C variable)
NVSHMEM_ENABLE_NIC_PE_MAPPING (C variable)
NVSHMEM_FCOLLECT_LL_THRESHOLD (C variable)
nvshmem_fence (C function)
,
[1]
nvshmem_finalize (C function)
nvshmem_free (C function)
nvshmem_getmem (C function)
,
[1]
nvshmem_getmem_nbi (C function)
,
[1]
nvshmem_getSIZE (C function)
,
[1]
nvshmem_getSIZE_nbi (C function)
,
[1]
nvshmem_global_exit (C function)
,
[1]
NVSHMEM_HCA_LIST (C variable)
NVSHMEM_HCA_PE_MAPPING (C variable)
NVSHMEM_IB_ENABLE_IBGDA (C variable)
NVSHMEM_IB_GID_INDEX (C variable)
NVSHMEM_IB_SL (C variable)
NVSHMEM_IB_TRAFFIC_CLASS (C variable)
NVSHMEM_IBGDA_DCI_MAP_BY (C variable)
NVSHMEM_IBGDA_FORCE_NIC_BUF_MEMTYPE (C variable)
NVSHMEM_IBGDA_NUM_DCI (C variable)
NVSHMEM_IBGDA_NUM_DCT (C variable)
NVSHMEM_IBGDA_NUM_FETCH_SLOTS_PER_DCI (C variable)
NVSHMEM_IBGDA_NUM_FETCH_SLOTS_PER_RC (C variable)
NVSHMEM_IBGDA_NUM_RC_PER_PE (C variable)
NVSHMEM_IBGDA_NUM_REQUESTS_IN_BATCH (C variable)
NVSHMEM_IBGDA_NUM_SHARED_DCI (C variable)
NVSHMEM_IBGDA_RC_MAP_BY (C variable)
nvshmem_igetSIZE (C function)
,
[1]
NVSHMEM_INFO (C variable)
nvshmem_info_get_name (C function)
,
[1]
nvshmem_info_get_version (C function)
,
[1]
nvshmem_init (C function)
nvshmem_init_thread (C function)
nvshmem_iputSIZE (C function)
,
[1]
NVSHMEM_MAJOR_VERSION (C variable)
nvshmem_malloc (C function)
NVSHMEM_MAX_MEMORY_PER_GPU (C variable)
NVSHMEM_MAX_NAME_LEN (C variable)
NVSHMEM_MAX_P2P_GPUS (C variable)
NVSHMEM_MAX_TEAMS (C variable)
NVSHMEM_MINOR_VERSION (C variable)
nvshmem_my_pe (C function)
,
[1]
nvshmem_n_pes (C function)
,
[1]
NVSHMEM_NVTX (C variable)
NVSHMEM_PROXY_REQUEST_BATCH_MAX (C variable)
nvshmem_ptr (C function)
,
[1]
nvshmem_putmem (C function)
,
[1]
nvshmem_putmem_nbi (C function)
,
[1]
nvshmem_putmem_signal (C function)
,
[1]
nvshmem_putmem_signal_nbi (C function)
,
[1]
nvshmem_putSIZE (C function)
,
[1]
nvshmem_putSIZE_nbi (C function)
,
[1]
nvshmem_putSIZE_signal (C function)
,
[1]
nvshmem_putSIZE_signal_nbi (C function)
,
[1]
nvshmem_query_thread (C function)
nvshmem_quiet (C function)
,
[1]
NVSHMEM_REMOTE_TRANSPORT (C variable)
NVSHMEM_SIGNAL_ADD (C variable)
nvshmem_signal_fetch (C function)
NVSHMEM_SIGNAL_SET (C variable)
nvshmem_signal_wait_until (C function)
NVSHMEM_SYMMETRIC_SIZE (C variable)
nvshmem_sync (C function)
,
[1]
nvshmem_sync_all (C function)
,
[1]
nvshmem_team_destroy (C function)
nvshmem_team_get_config (C function)
NVSHMEM_TEAM_INVALID (C variable)
nvshmem_team_my_pe (C function)
,
[1]
nvshmem_team_n_pes (C function)
,
[1]
NVSHMEM_TEAM_SHARED (C variable)
nvshmem_team_split_2d (C function)
nvshmem_team_split_strided (C function)
nvshmem_team_sync (C function)
,
[1]
nvshmem_team_translate_pe (C function)
NVSHMEM_TEAM_WORLD (C variable)
NVSHMEM_THREAD_FUNNELED (C variable)
NVSHMEM_THREAD_MULTIPLE (C variable)
NVSHMEM_THREAD_SERIALIZED (C variable)
NVSHMEM_THREAD_SINGLE (C variable)
nvshmem_TYPENAME_alltoall (C function)
,
[1]
nvshmem_TYPENAME_and_reduce (C function)
,
[1]
nvshmem_TYPENAME_atomic_add (C function)
,
[1]
nvshmem_TYPENAME_atomic_and (C function)
,
[1]
nvshmem_TYPENAME_atomic_compare_swap (C function)
,
[1]
nvshmem_TYPENAME_atomic_fetch (C function)
,
[1]
nvshmem_TYPENAME_atomic_fetch_add (C function)
,
[1]
nvshmem_TYPENAME_atomic_fetch_and (C function)
,
[1]
nvshmem_TYPENAME_atomic_fetch_inc (C function)
,
[1]
nvshmem_TYPENAME_atomic_fetch_or (C function)
,
[1]
nvshmem_TYPENAME_atomic_fetch_xor (C function)
,
[1]
nvshmem_TYPENAME_atomic_inc (C function)
,
[1]
nvshmem_TYPENAME_atomic_or (C function)
,
[1]
nvshmem_TYPENAME_atomic_set (C function)
,
[1]
nvshmem_TYPENAME_atomic_swap (C function)
,
[1]
nvshmem_TYPENAME_atomic_xor (C function)
,
[1]
nvshmem_TYPENAME_broadcast (C function)
,
[1]
nvshmem_TYPENAME_fcollect (C function)
,
[1]
nvshmem_TYPENAME_g (C function)
,
[1]
nvshmem_TYPENAME_get (C function)
,
[1]
nvshmem_TYPENAME_get_nbi (C function)
,
[1]
nvshmem_TYPENAME_iget (C function)
,
[1]
nvshmem_TYPENAME_iput (C function)
,
[1]
nvshmem_TYPENAME_max_reduce (C function)
,
[1]
nvshmem_TYPENAME_min_reduce (C function)
,
[1]
nvshmem_TYPENAME_or_reduce (C function)
,
[1]
nvshmem_TYPENAME_p (C function)
,
[1]
nvshmem_TYPENAME_prod_reduce (C function)
,
[1]
nvshmem_TYPENAME_put (C function)
,
[1]
nvshmem_TYPENAME_put_nbi (C function)
,
[1]
nvshmem_TYPENAME_put_signal (C function)
,
[1]
nvshmem_TYPENAME_put_signal_nbi (C function)
,
[1]
nvshmem_TYPENAME_sum_reduce (C function)
,
[1]
nvshmem_TYPENAME_test (C function)
nvshmem_TYPENAME_test_all (C function)
nvshmem_TYPENAME_test_all_vector (C function)
nvshmem_TYPENAME_test_any (C function)
nvshmem_TYPENAME_test_any_vector (C function)
nvshmem_TYPENAME_test_some (C function)
nvshmem_TYPENAME_test_some_vector (C function)
nvshmem_TYPENAME_wait (C function)
nvshmem_TYPENAME_wait_until (C function)
nvshmem_TYPENAME_wait_until_all (C function)
nvshmem_TYPENAME_wait_until_all_vector (C function)
nvshmem_TYPENAME_wait_until_any (C function)
nvshmem_TYPENAME_wait_until_any_vector (C function)
nvshmem_TYPENAME_wait_until_some (C function)
nvshmem_TYPENAME_wait_until_some_vector (C function)
nvshmem_TYPENAME_xor_reduce (C function)
,
[1]
NVSHMEM_VENDOR_MAJOR_VERSION (C variable)
NVSHMEM_VENDOR_MINOR_VERSION (C variable)
NVSHMEM_VENDOR_PATCH_VERSION (C variable)
NVSHMEM_VENDOR_STRING (C variable)
NVSHMEM_VENDOR_VERSION (C variable)
NVSHMEM_VERSION (C variable)
nvshmemx_alltoallmem_block (C function)
nvshmemx_alltoallmem_on_stream (C function)
nvshmemx_alltoallmem_warp (C function)
nvshmemx_barrier_all_block (C function)
nvshmemx_barrier_all_on_stream (C function)
nvshmemx_barrier_all_warp (C function)
nvshmemx_barrier_block (C function)
nvshmemx_barrier_on_stream (C function)
nvshmemx_barrier_warp (C function)
nvshmemx_buffer_register (C function)
nvshmemx_buffer_unregister (C function)
nvshmemx_buffer_unregister_all (C function)
nvshmemx_collective_launch (C function)
nvshmemx_collective_launch_query_gridsize (C function)
nvshmemx_cumodule_init (C function)
nvshmemx_fence_on_stream (C function)
nvshmemx_getmem_block (C function)
nvshmemx_getmem_nbi_block (C function)
nvshmemx_getmem_nbi_on_stream (C function)
nvshmemx_getmem_nbi_warp (C function)
nvshmemx_getmem_on_stream (C function)
nvshmemx_getmem_warp (C function)
nvshmemx_getSIZE_block (C function)
nvshmemx_getSIZE_nbi_block (C function)
nvshmemx_getSIZE_nbi_on_stream (C function)
nvshmemx_getSIZE_nbi_warp (C function)
nvshmemx_getSIZE_on_stream (C function)
nvshmemx_getSIZE_warp (C function)
nvshmemx_igetSIZE_block (C function)
nvshmemx_igetSIZE_on_stream (C function)
nvshmemx_igetSIZE_warp (C function)
nvshmemx_init_attr (C function)
nvshmemx_init_status (C function)
nvshmemx_iputSIZE_block (C function)
nvshmemx_iputSIZE_on_stream (C function)
nvshmemx_iputSIZE_warp (C function)
nvshmemx_putmem_block (C function)
nvshmemx_putmem_nbi_block (C function)
nvshmemx_putmem_nbi_on_stream (C function)
nvshmemx_putmem_nbi_warp (C function)
nvshmemx_putmem_on_stream (C function)
nvshmemx_putmem_signal_block (C function)
nvshmemx_putmem_signal_nbi_block (C function)
nvshmemx_putmem_signal_nbi_on_stream (C function)
nvshmemx_putmem_signal_nbi_warp (C function)
nvshmemx_putmem_signal_on_stream (C function)
nvshmemx_putmem_signal_warp (C function)
nvshmemx_putmem_warp (C function)
nvshmemx_putSIZE_block (C function)
nvshmemx_putSIZE_nbi_block (C function)
nvshmemx_putSIZE_nbi_on_stream (C function)
nvshmemx_putSIZE_nbi_warp (C function)
nvshmemx_putSIZE_on_stream (C function)
nvshmemx_putSIZE_signal_block (C function)
nvshmemx_putSIZE_signal_nbi_block (C function)
nvshmemx_putSIZE_signal_nbi_on_stream (C function)
nvshmemx_putSIZE_signal_nbi_warp (C function)
nvshmemx_putSIZE_signal_on_stream (C function)
nvshmemx_putSIZE_signal_warp (C function)
nvshmemx_putSIZE_warp (C function)
nvshmemx_quiet_on_stream (C function)
nvshmemx_signal_op (C function)
nvshmemx_signal_wait_until_on_stream (C function)
nvshmemx_sync_all_block (C function)
nvshmemx_sync_all_on_stream (C function)
nvshmemx_sync_all_warp (C function)
nvshmemx_sync_block (C function)
nvshmemx_sync_on_stream (C function)
nvshmemx_sync_warp (C function)
NVSHMEMX_TEAM_NODE (C variable)
nvshmemx_team_sync_block (C function)
nvshmemx_team_sync_on_stream (C function)
nvshmemx_team_sync_warp (C function)
nvshmemx_TYPENAME_alltoall_block (C function)
nvshmemx_TYPENAME_alltoall_on_stream (C function)
nvshmemx_TYPENAME_alltoall_warp (C function)
nvshmemx_TYPENAME_and_reduce_block (C function)
nvshmemx_TYPENAME_and_reduce_on_stream (C function)
nvshmemx_TYPENAME_and_reduce_warp (C function)
nvshmemx_TYPENAME_broadcast_block (C function)
nvshmemx_TYPENAME_broadcast_on_stream (C function)
nvshmemx_TYPENAME_broadcast_warp (C function)
nvshmemx_TYPENAME_fcollect_block (C function)
nvshmemx_TYPENAME_fcollect_on_stream (C function)
nvshmemx_TYPENAME_fcollect_warp (C function)
nvshmemx_TYPENAME_get_block (C function)
nvshmemx_TYPENAME_get_nbi_block (C function)
nvshmemx_TYPENAME_get_nbi_on_stream (C function)
nvshmemx_TYPENAME_get_nbi_warp (C function)
nvshmemx_TYPENAME_get_on_stream (C function)
nvshmemx_TYPENAME_get_warp (C function)
nvshmemx_TYPENAME_iget_block (C function)
nvshmemx_TYPENAME_iget_on_stream (C function)
nvshmemx_TYPENAME_iget_warp (C function)
nvshmemx_TYPENAME_iput_block (C function)
nvshmemx_TYPENAME_iput_on_stream (C function)
nvshmemx_TYPENAME_iput_warp (C function)
nvshmemx_TYPENAME_max_reduce_block (C function)
nvshmemx_TYPENAME_max_reduce_on_stream (C function)
nvshmemx_TYPENAME_max_reduce_warp (C function)
nvshmemx_TYPENAME_min_reduce_block (C function)
nvshmemx_TYPENAME_min_reduce_on_stream (C function)
nvshmemx_TYPENAME_min_reduce_warp (C function)
nvshmemx_TYPENAME_or_reduce_block (C function)
nvshmemx_TYPENAME_or_reduce_on_stream (C function)
nvshmemx_TYPENAME_or_reduce_warp (C function)
nvshmemx_TYPENAME_prod_reduce_block (C function)
nvshmemx_TYPENAME_prod_reduce_on_stream (C function)
nvshmemx_TYPENAME_prod_reduce_warp (C function)
nvshmemx_TYPENAME_put_block (C function)
nvshmemx_TYPENAME_put_nbi_block (C function)
nvshmemx_TYPENAME_put_nbi_on_stream (C function)
nvshmemx_TYPENAME_put_nbi_warp (C function)
nvshmemx_TYPENAME_put_on_stream (C function)
nvshmemx_TYPENAME_put_signal_block (C function)
nvshmemx_TYPENAME_put_signal_nbi_block (C function)
nvshmemx_TYPENAME_put_signal_nbi_on_stream (C function)
nvshmemx_TYPENAME_put_signal_nbi_warp (C function)
nvshmemx_TYPENAME_put_signal_on_stream (C function)
nvshmemx_TYPENAME_put_signal_warp (C function)
nvshmemx_TYPENAME_put_warp (C function)
nvshmemx_TYPENAME_signal (C function)
nvshmemx_TYPENAME_sum_reduce_block (C function)
nvshmemx_TYPENAME_sum_reduce_on_stream (C function)
nvshmemx_TYPENAME_sum_reduce_warp (C function)
nvshmemx_TYPENAME_wait_on_stream (C function)
nvshmemx_TYPENAME_wait_until_on_stream (C function)
nvshmemx_TYPENAME_xor_reduce_block (C function)
nvshmemx_TYPENAME_xor_reduce_on_stream (C function)
nvshmemx_TYPENAME_xor_reduce_warp (C function)