Execution Model¶

An NVSHMEM program consists of a set of NVSHMEM processes called PEs. While not required by NVSHMEM, in typical usage, PEs are executed using a single program, multiple data (SPMD) model. SPMD requires each PE to use the same executable; however, PEs are able to follow divergent control paths. PEs are implemented using OS processes and PEs are permitted to create additional threads, when threading support is enabled.

PE execution is loosely coupled, relying on NVSHMEM operations to communicate and synchronize among executing PEs. The NVSHMEM phase in a program begins with a call to the initialization routine nvshmem_init or nvshmem_init_thread, which must be performed before using any of the other NVSHMEM library routines. An NVSHMEM program concludes its use of the NVSHMEM library when all PEs call nvshmem_finalize or any PE calls nvshmem_global_exit. During a call to nvshmem_finalize, the NVSHMEM library must complete all pending communication and release all the resources associated to the library using an implicit collective synchronization across PEs. Calling any NVSHMEM routine before initialization or after nvshmem_finalize leads to undefined behavior. After finalization, a subsequent initialization call also leads to undefined behavior.

The PEs of the NVSHMEM program are identified by unique integers. The identifiers are integers assigned in a monotonically increasing manner from zero to one less than the total number of PEs. PE identifiers are used for NVSHMEM calls (e.g., to specify put or get routines on symmetric data objects, collective synchronization calls) or to dictate a control flow for PEs using constructs of C. The identifiers are fixed for the duration of the NVSHMEM phase of a program.

Progress of NVSHMEM Operations¶

The NVSHMEM model assumes that computation and communication are naturally overlapped. NVSHMEM programs are expected to exhibit progression of communication both with and without NVSHMEM calls. Consider a PE that is engaged in a computation with no NVSHMEM calls. Other PEs should be able to communicate (e.g., put, get, atomic, etc.) and complete communication operations with that computationally-bound PE without that PE issuing any explicit NVSHMEM calls. One-sided NVSHMEM communication calls involving that PE should progress regardless of when that PE next engages in an NVSHMEM call.

Invoking NVSHMEM Operations¶

Pointer arguments to NVSHMEM routines that point to non-const data must not overlap in memory with other arguments to the same NVSHMEM operation, with the exception of in-place reductions as described in Section NVSHMEM_REDUCTIONS. Otherwise, the behavior is undefined. Two arguments overlap in memory if any of their data elements are contained in the same physical memory locations. For example, consider an address \(a\) returned by the nvshmem_ptr operation for symmetric object \(A\) on PE \(i\). Providing the local address \(a\) and the symmetric address of object \(A\) to an NVSHMEM operation targeting PE \(i\) results in undefined behavior.

Buffers provided to NVSHMEM routines are in-use until the corresponding NVSHMEM operation has completed at the calling PE. Updates to a buffer that is in-use, including updates performed through locally and remotely issued NVSHMEM operations, result in undefined behavior. Similarly, reads from a buffer that is in-use are allowed only when the buffer was provided as a const-qualified argument to the NVSHMEM routine for which it is in-use. Otherwise, the behavior is undefined. Exceptions are made for buffers that are in-use by AMOs, as described in Section Atomicity Guarantees. For information regarding the completion of NVSHMEM operations, see Section Memory Ordering.

NVSHMEM routines with multiple symmetric object arguments do not require these symmetric objects to be located within the same symmetric memory segment. For example, objects located in the symmetric data segment and objects located in the symmetric heap can be provided as arguments to the same NVSHMEM operation.