Signaling Operations

NVSHMEM provides signaling operations that can be used to update a remote flag variable. When used in conjunction with wait/test routines at the remote PE, these routines can provide efficient point-to-point synchronization. The following example shows signal operations used to implement neighbor communication in a ring.

nvshmem_putmem(dest, src, size, (pe+1) % npes);
nvshmem_quiet();
nvshmemx_int_signal(flag, 1, (pe+1) % npes);
nvshmem_int_wait_until(flag, NVSHMEM_CMP_EQ, 1);

his section specifies the NVSHMEM support for put-with-signal, nonblocking put-with-signal, and signal-fetch routines. The put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion. The signal-fetch routine provides support for fetching a signal update operation.

Atomicity Guarantees for Signaling Operations

All signaling operations put-with-signal, nonblocking put-with-signal, and signal-fetch are performed on a signal data object, a remotely accessible symmetric object of type uint64_t. A signal operator in the put-with-signal routine is a NVSHMEM library constant that determines the type of update to be performed as a signal on the signal data object.

All signaling operations complete as if performed atomically with respect to the following:

  • other signal operations that update the signal data object using the same datatype;

  • signal-fetch routine that fetches the signal data object; and

  • any point-to-point synchronization routine that accesses the signal data object using the same datatype.

Available Signal Operators

With the atomicity guarantees as described in Section Atomicity Guarantees for Signaling Operations, the following options can be used as a signal operator.

NVSHMEM_SIGNAL_SET

An update to signal data object is an atomic set operation. It writes an unsigned 64-bit value as a signal into the signal data object on a remote PE as an atomic operation.

NVSHMEM_SIGNAL_ADD

An update to signal data object is an atomic add operation. It adds an unsigned 64-bit value as a signal into the signal data object on a remote PE as an atomic operation.

NVSHMEM_PUT_SIGNAL

void nvshmem\_TYPENAME\_put\_signal(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
void nvshmemx\_TYPENAME\_put\_signal_on_stream(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
__device__ void nvshmem\_TYPENAME\_put\_signal(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_TYPENAME\_put\_signal_block(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_TYPENAME\_put\_signal_warp(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)

where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.

void nvshmem\_putSIZE\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
void nvshmemx\_putSIZE\_signal_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
__device__ void nvshmem\_putSIZE\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putSIZE\_signal_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putSIZE\_signal_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)

where SIZE is one of 8, 16, 32, 64, 128.

void nvshmem\_putmem\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
void nvshmemx\_putmem\_signal_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
__device__ void nvshmem\_putmem\_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putmem\_signal_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putmem\_signal_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
dest [OUT]

Symmetric address of the data object to be updated on the remote PE. The type of dest should match that implied in the SYNOPSIS section.

source [IN]

Symmetric address of data object containing the data to be copied. The type of source should match that implied in the SYNOPSIS section.

nelems [IN]

Number of elements in the dest and source arrays. For nvshmem_putmem_signal and nvshmem_ctx_putmem_signal, elements are bytes.

sig_addr [OUT]

Symmetric address of the signal data object to be updated on the remote PE as a signal.

signal [IN]

Unsigned 64-bit value that is used for updating the remote sig_addr signal data object.

sig_op [IN]

Signal operator that represents the type of update to be performed on the remote sig_addr signal data object.

pe [IN]

PE number of the remote PE.

Description

The put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion. The routines return after the data has been copied out of the source array on the local PE.

The sig_op signal operator determines the type of update to be performed on the remote sig_addr signal data object. The completion of signal update based on the sig_op signal operator using the signal flag on the remote PE indicates the delivery of its corresponding dest data words into the data object on the remote PE.

An update to the sig_addr signal data object through a put-with-signal routine completes as if performed atomically as described in Section Atomicity Guarantees for Signaling Operations. The various options as described in Section Available Signal Operators can be used as the sig_op signal operator.

Returns

None.

Notes

The dest and sig_addr data objects must both be remotely accessible.

sig_addr and dest may not be overlapping in memory.

The completion of signal update using the signal flag on the remote PE indicates only the delivery of its corresponding dest data words into the data object on the remote PE. Without a memory-ordering operation, there is no implied ordering between the signal update of a put-with-signal routine and another data transfer. For example, the completion of the signal update in a sequence consisting of a put routine followed by a put-with-signal routine does not imply delivery of the put routine’s data.

NVSHMEM_PUT_SIGNAL_NBI

void nvshmem\_TYPENAME\_put\_signal\_nbi(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
void nvshmemx\_TYPENAME\_put\_signal\_nbi_on_stream(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
__device__ void nvshmem\_TYPENAME\_put\_signal\_nbi(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_TYPENAME\_put\_signal\_nbi_block(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_TYPENAME\_put\_signal\_nbi_warp(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)

where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.

void nvshmem\_putSIZE\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
void nvshmemx\_putSIZE\_signal\_nbi_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
__device__ void nvshmem\_putSIZE\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putSIZE\_signal\_nbi_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putSIZE\_signal\_nbi_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)

where SIZE is one of 8, 16, 32, 64, 128.

void nvshmem\_putmem\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
void nvshmemx\_putmem\_signal\_nbi_on_stream(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe, cudaStream_t stream)
__device__ void nvshmem\_putmem\_signal\_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putmem\_signal\_nbi_block(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
__device__ void nvshmemx\_putmem\_signal\_nbi_warp(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
dest [OUT]

Symmetric address of the data object to be updated on the remote PE. The type of dest should match that implied in the SYNOPSIS section.

source [IN]

Symmetric address of data object containing the data to be copied. The type of source should match that implied in the SYNOPSIS section.

nelems [IN]

Number of elements in the dest and source arrays. For nvshmem_putmem_signal_nbi and nvshmem_ctx_putmem_signal_nbi, elements are bytes.

sig_addr [OUT]

Symmetric address of the signal data object to be updated on the remote PE as a signal.

signal [IN]

Unsigned 64-bit value that is used for updating the remote sig_addr signal data object.

sig_op [IN]

Signal operator that represents the type of update to be performed on the remote sig_addr signal data object.

pe [IN]

PE number of the remote PE.

Description

The nonblocking put-with-signal routines provide a method for copying data from a contiguous local data object to a data object on a specified PE and subsequently updating a remote flag to signal completion.

The routines return after initiating the operation. The operation is considered complete after a subsequent call to nvshmem_quiet. At the completion of nvshmem_quiet, the data has been copied out of the source array on the local PE and delivered into the dest array on the destination PE.

The delivery of signal flag on the remote PE indicates only the delivery of its corresponding dest data words into the data object on the remote PE. Furthermore, two successive nonblocking put-with-signal routines, or a nonblocking put-with-signal routine with another data transfer may deliver data out of order unless a call to nvshmem_fence is introduced between the two calls.

The sig_op signal operator determines the type of update to be performed on the remote sig_addr signal data object.

An update to the sig_addr signal data object through a nonblocking put-with-signal routine completes as if performed atomically as described in Section Atomicity Guarantees for Signaling Operations. The various options as described in Section Available Signal Operators can be used as the sig_op signal operator.

Returns

None.

Notes

The dest and sig_addr data objects must both be remotely accessible.

sig_addr and dest may not be overlapping in memory.

NVSHMEM_SIGNAL_FETCH

__device__ uint64_t nvshmem\_signal\_fetch(const uint64_t *sig_addr)
sig_addr [IN]

Local address of the remotely accessible signal variable.

Description

nvshmem_signal_fetch performs a fetch operation and returns the contents of the sig_addr signal data object. Access to sig_addr signal object at the calling PE is expected to satisfy the atomicity guarantees as described in Section Atomicity Guarantees for Signaling Operations.

Returns

Returns the contents of the signal data object, sig_addr, at the calling PE.

The following datatypes are supported by the NVSHMEM signal operation.

Signal Types and Names

TYPE

TYPENAME

short

short

int

int

long

long

long long

longlong

unsigned short

ushort

unsigned int

uint

unsigned long

ulong

unsigned long long

ulonglong

int32_t

int32

int64_t

int64

uint32_t

uint32

uint64_t

uint64

size_t

size

ptrdiff_t

ptrdiff

NVSHMEMX_SIGNAL

Deprecated, see nvshmemx_signal_op or nvshmem_*_atomic_set.

__device__ inline void nvshmemx_TYPENAME_signal(TYPE *dest, const TYPE value, int pe)

where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Signal Types and Names.

dest [OUT]

Symmetric address of the signal word to be updated.

value [IN]

The value to be placed in dest.

pe [IN]

PE number of the remote PE.

Description

The nvshmemx_signal operation atomically sets dest to value on the specified PE. This operation can be used together with wait and test routines for efficient point-to-point synchronization.

Returns

None.

NVSHMEMX_SIGNAL_OP

__device__ inline void nvshmemx_signal_op(uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
sig_addr [OUT]

Symmetric address of the signal word to be updated.

signal [IN]

The value used to update sig_addr.

sig_op [IN]

Operation used to update sig_addr with signal.

pe [IN]

PE number of the remote PE.

Description

The nvshmemx_signal_op operation atomically updates sig_addr with signal using operation sig_op on the specified PE. This operation can be used together with wait and test routines for efficient point-to-point synchronization.

Returns

None.