Signaling Operations¶
NVSHMEM provides signaling operations that can be used to update a remote flag variable. When used in conjunction with wait/test routines at the remote PE, these routines can provide efficient point-to-point synchronization. The following example shows signal operations used to implement neighbor communication in a ring.
nvshmem_putmem(dest, src, size, (pe+1) % npes);
nvshmem_quiet();
nvshmemx_int_signal(flag, 1, (pe+1) % npes);
nvshmem_int_wait_until(flag, NVSHMEM_CMP_EQ, 1);
his section specifies the NVSHMEM support for put-with-signal, nonblocking put-with-signal, and signal-fetch routines. The put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion. The signal-fetch routine provides support for fetching a signal update operation.
Atomicity Guarantees for Signaling Operations¶
All signaling operations put-with-signal, nonblocking put-with-signal,
and signal-fetch are performed on a signal data object, a remotely
accessible symmetric object of type uint64_t
. A signal operator in
the put-with-signal routine is a NVSHMEM library constant that
determines the type of update to be performed as a signal on the signal
data object.
All signaling operations complete as if performed atomically with respect to the following:
other signal operations that update the signal data object using the same datatype;
signal-fetch routine that fetches the signal data object; and
any point-to-point synchronization routine that accesses the signal data object using the same datatype.
Available Signal Operators¶
With the atomicity guarantees as described in Section Atomicity Guarantees for Signaling Operations, the following options can be used as a signal operator.
NVSHMEM_SIGNAL_SET
An update to signal data object is an atomic set operation. It writes an unsigned 64-bit value as a signal into the signal data object on a remote
PE
as an atomic operation.NVSHMEM_SIGNAL_ADD
An update to signal data object is an atomic add operation. It adds an unsigned 64-bit value as a signal into the signal data object on a remote
PE
as an atomic operation.
NVSHMEM_PUT_SIGNAL¶
- void nvshmem\_TYPENAME\_put\_signal(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- void nvshmemx\_TYPENAME\_put\_signal_on_stream(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_put\_signal(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_TYPENAME\_put\_signal_block(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_TYPENAME\_put\_signal_warp(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_putSIZE\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- void nvshmemx\_putSIZE\_signal_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putSIZE\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putSIZE\_signal_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putSIZE\_signal_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- void nvshmem\_putmem\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- void nvshmemx\_putmem\_signal_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putmem\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putmem\_signal_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putmem\_signal_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- dest [OUT]
Symmetric address of the data object to be updated on the remote PE. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of data object containing the data to be copied. The type of
source
should match that implied in the SYNOPSIS section.- nelems [IN]
Number of elements in the
dest
andsource
arrays. Fornvshmem_putmem_signal
andnvshmem_ctx_putmem_signal
, elements are bytes.- sig_addr [OUT]
Symmetric address of the signal data object to be updated on the remote PE as a signal.
- signal [IN]
Unsigned 64-bit value that is used for updating the remote
sig_addr
signal data object.- sig_op [IN]
Signal operator that represents the type of update to be performed on the remote
sig_addr
signal data object.- pe [IN]
PE number of the remote PE.
Description
The put-with-signal routines provide a method for copying data from a
contiguous local data object to a data object on a specified PE and
subsequently updating a remote flag to signal completion. The routines
return after the data has been copied out of the source
array on the
local PE.
The sig_op
signal operator determines the type of update to be
performed on the remote sig_addr
signal data object. The completion
of signal update based on the sig_op
signal operator using the
signal
flag on the remote PE indicates the delivery of its
corresponding dest
data words into the data object on the remote PE.
An update to the sig_addr
signal data object through a
put-with-signal routine completes as if performed atomically as
described in Section Atomicity Guarantees for Signaling Operations. The various
options as described in Section Available Signal Operators can
be used as the sig_op
signal operator.
Returns
None.
Notes
The dest
and sig_addr
data objects must both be remotely
accessible.
sig_addr
and dest
may not be overlapping in memory.
The completion of signal update using the signal
flag on the remote
PE indicates only the delivery of its corresponding dest
data words
into the data object on the remote PE. Without a memory-ordering
operation, there is no implied ordering between the signal update of a
put-with-signal routine and another data transfer. For example, the
completion of the signal update in a sequence consisting of a put
routine followed by a put-with-signal routine does not imply delivery
of the put routine’s data.
NVSHMEM_PUT_SIGNAL_NBI¶
- void nvshmem\_TYPENAME\_put\_signal\_nbi(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- void nvshmemx\_TYPENAME\_put\_signal\_nbi_on_stream(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_put\_signal\_nbi(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_TYPENAME\_put\_signal\_nbi_block(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_TYPENAME\_put\_signal\_nbi_warp(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_putSIZE\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- void nvshmemx\_putSIZE\_signal\_nbi_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putSIZE\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putSIZE\_signal\_nbi_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putSIZE\_signal\_nbi_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- void nvshmem\_putmem\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- void nvshmemx\_putmem\_signal\_nbi_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putmem\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putmem\_signal\_nbi_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- __device__ void nvshmemx\_putmem\_signal\_nbi_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- dest [OUT]
Symmetric address of the data object to be updated on the remote PE. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of data object containing the data to be copied. The type of
source
should match that implied in the SYNOPSIS section.- nelems [IN]
Number of elements in the
dest
andsource
arrays. Fornvshmem_putmem_signal_nbi
andnvshmem_ctx_putmem_signal_nbi
, elements are bytes.- sig_addr [OUT]
Symmetric address of the signal data object to be updated on the remote PE as a signal.
- signal [IN]
Unsigned 64-bit value that is used for updating the remote
sig_addr
signal data object.- sig_op [IN]
Signal operator that represents the type of update to be performed on the remote
sig_addr
signal data object.- pe [IN]
PE number of the remote PE.
Description
The nonblocking put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion.
The routines return after initiating the operation. The operation is
considered complete after a subsequent call to nvshmem_quiet
. At the
completion of nvshmem_quiet
, the data has been copied out of the
source
array on the local PE and delivered into the dest
array
on the destination PE.
The delivery of signal
flag on the remote PE indicates only the
delivery of its corresponding dest
data words into the data object
on the remote PE. Furthermore, two successive nonblocking
put-with-signal routines, or a nonblocking put-with-signal routine
with another data transfer may deliver data out of order unless a call
to nvshmem_fence
is introduced between the two calls.
The sig_op
signal operator determines the type of update to be
performed on the remote sig_addr
signal data object.
An update to the sig_addr
signal data object through a nonblocking
put-with-signal routine completes as if performed atomically as
described in Section Atomicity Guarantees for Signaling Operations. The various
options as described in Section Available Signal Operators can
be used as the sig_op
signal operator.
Returns
None.
Notes
The dest
and sig_addr
data objects must both be remotely
accessible.
sig_addr
and dest
may not be overlapping in memory.
NVSHMEM_SIGNAL_FETCH¶
- __device__ uint64_t nvshmem\_signal\_fetch(const uint64_t *sig_addr)
- sig_addr [IN]
Local address of the remotely accessible signal variable.
Description
nvshmem_signal_fetch
performs a fetch operation and returns the
contents of the sig_addr
signal data object. Access to sig_addr
signal object at the calling PE is expected to satisfy the atomicity
guarantees as described in Section Atomicity Guarantees for Signaling Operations.
Returns
Returns the contents of the signal data object, sig_addr
, at the
calling PE.
The following datatypes are supported by the NVSHMEM signal operation.
TYPE |
TYPENAME |
---|---|
short |
short |
int |
int |
long |
long |
long long |
longlong |
unsigned short |
ushort |
unsigned int |
uint |
unsigned long |
ulong |
unsigned long long |
ulonglong |
int32_t |
int32 |
int64_t |
int64 |
uint32_t |
uint32 |
uint64_t |
uint64 |
size_t |
size |
ptrdiff_t |
ptrdiff |
NVSHMEMX_SIGNAL¶
Deprecated, see nvshmemx_signal_op
or nvshmem_*_atomic_set
.
- __device__ inline void nvshmemx_TYPENAME_signal(TYPE *dest, const TYPE value, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Signal Types and Names.
- dest [OUT]
Symmetric address of the signal word to be updated.
- value [IN]
The value to be placed in dest.
- pe [IN]
PE number of the remote PE.
Description
The nvshmemx_signal operation atomically sets dest to value on the specified PE. This operation can be used together with wait and test routines for efficient point-to-point synchronization.
Returns
None.
NVSHMEMX_SIGNAL_OP¶
- __device__ inline void nvshmemx_signal_op(uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
- sig_addr [OUT]
Symmetric address of the signal word to be updated.
- signal [IN]
The value used to update sig_addr.
- sig_op [IN]
Operation used to update sig_addr with signal.
- pe [IN]
PE number of the remote PE.
Description
The nvshmemx_signal_op operation atomically updates sig_addr with signal using operation sig_op on the specified PE. This operation can be used together with wait and test routines for efficient point-to-point synchronization.
Returns
None.