Remote Memory Access¶
The RMA routines described in this section can be used to perform reads from and writes to symmetric data objects. These operations are one-sided, meaning that the PE invoking an operation provides all communication parameters and the targeted PE is passive. A characteristic of one-sided communication is that it decouples communication from synchronization. One-sided communication mechanisms transfer data; however, they do not synchronize the sender of the data with the receiver of the data.
NVSHMEMRMA routines are performed on symmetric data objects. The
initiator PE of a call is designated as the origin PE and the PE
targeted by an operation is designated as the destination PE. The
source
and dest
designators refer to the data objects that an
operation reads from and writes to. In the case of the remote update
routine, Put, the origin PE provides the source
data object and
the destination PE provides the dest
data object. In the case of the
remote read routine, Get, the origin PE provides the dest
data
object and the destination PE provides the source
data object.
The standard RMA types include the exact-width integer types defined in
stdint.h
by C §7.18.1.1 and C §7.20.1.1. When the C
translation environment does not provide exact-width integer types with
stdint.h
, an NVSHMEM implemementation is not required to provide
support for these types.
Blocking RMA¶
NVSHMEM_PUT¶
- void nvshmem\_TYPENAME\_put(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- void nvshmemx\_TYPENAME\_put_on_stream(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_put(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_put_block(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_put_warp(TYPE *dest, const TYPE *source, size_t nelems, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_putSIZE(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_putSIZE_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putSIZE(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putSIZE_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putSIZE_warp(void *dest, const void *source, size_t nelems, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- void nvshmem\_putmem(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_putmem_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putmem(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putmem_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putmem_warp(void *dest, const void *source, size_t nelems, int pe)
- dest [OUT]
Symmetric address of the destination data object. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of the data object containing the data to be copied. The type of
source
should match that implied in the SYNOPSIS section.- nelems [IN]
Number of elements in the
dest
andsource
arrays. Fornvshmem_putmem
andnvshmem_ctx_putmem
, elements are bytes.- pe [IN]
PE number of the remote PE.
Description
The routines return after the data has been copied out of the source
array on the local PE. The delivery of data words into the data object
on the destination PE may occur in any order. Furthermore, two
successive put routines may deliver data out of order unless a call to
nvshmem_fence
is introduced between the two calls.
Returns
None.
NVSHMEM_P¶
- void nvshmem\_TYPENAME\_p(TYPE *dest, TYPE value, int pe)
- __device__ void nvshmem\_TYPENAME\_p(TYPE *dest, TYPE value, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- dest [OUT]
Symmetric address of the destination data object. The type of
dest
should match that implied in the SYNOPSIS section.- value [IN]
The value to be transferred to
dest
. The type ofvalue
should match that implied in the SYNOPSIS section.- pe [IN]
The number of the remote PE.
Description
These routines provide a very low latency put capability for single elements of most basic types.
As with nvshmem_put
, these routines start the remote transfer and
may return before the data is delivered to the remote PE. Use
nvshmem_quiet
to force completion of all remote Put transfers.
Returns
None.
NVSHMEM_IPUT¶
- void nvshmem\_TYPENAME\_iput(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- void nvshmemx\_TYPENAME\_iput_on_stream(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_iput(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_iput_block(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_iput_warp(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_iputSIZE(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- void nvshmemx\_iputSIZE_on_stream(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_iputSIZE(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_iputSIZE_block(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_iputSIZE_warp(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- dest [OUT]
Symmetric address of the destination array data object. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of the array containing the data to be copied. The type of
source
should match that implied in the SYNOPSIS section.- dst [IN]
The stride between consecutive elements of the
dest
array. The stride is scaled by the element size of thedest
array. A value of1
indicates contiguous data.- sst [IN]
The stride between consecutive elements of the
source
array. The stride is scaled by the element size of thesource
array. A value of1
indicates contiguous data.- nelems [IN]
Number of elements in the
dest
andsource
arrays.- pe [IN]
PE number of the remote PE.
Description
The iput
routines provide a method for copying strided data elements
(specified by sst
) of an array from a source
array on the local
PE to locations specified by stride dst
on a dest
array on
specified remote PE. Both strides, dst
and sst
, must be greater
than or equal to 1
. The routines return when the data has been
copied out of the source
array on the local PE but not necessarily
before the data has been delivered to the remote data object.
Returns
None.
Notes
See Section Memory Model for a definition of the term remotely accessible.
NVSHMEM_GET¶
- void nvshmem\_TYPENAME\_get(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- void nvshmemx\_TYPENAME\_get_on_stream(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_get(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_get_block(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_get_warp(TYPE *dest, const TYPE *source, size_t nelems, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_getSIZE(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_getSIZE_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_getSIZE(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getSIZE_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getSIZE_warp(void *dest, const void *source, size_t nelems, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- void nvshmem\_getmem(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_getmem_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_getmem(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getmem_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getmem_warp(void *dest, const void *source, size_t nelems, int pe)
- dest [OUT]
Symmetric address of the data object to be updated. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of the source data object. The type of
source
should match that implied in the SYNOPSIS section.- nelems [IN]
Number of elements in the
dest
andsource
arrays. Fornvshmem_getmem
andnvshmem_ctx_getmem
, elements are bytes.- pe [IN]
PE number of the remote PE.
Description
The get routines provide a method for copying a contiguous symmetric
data object from a different PE to a contiguous data object on the local
PE. The routines return after the data has been delivered to the
dest
array on the local PE.
Returns
None.
Notes
See Section Memory Model for a definition of the term remotely accessible.
NVSHMEM_G¶
- TYPE nvshmem\_TYPENAME\_g(const TYPE *source, int pe)
- __device__ TYPE nvshmem\_TYPENAME\_g(const TYPE *source, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- source [IN]
Symmetric address of the source data object. The type of
source
should match that implied in the SYNOPSIS section.- pe [IN]
The number of the remote PE on which
source
resides.
Description
These routines provide a very low latency get capability for single elements of most basic types.
Returns
Returns a single element of type specified in the synopsis.
NVSHMEM_IGET¶
- void nvshmem\_TYPENAME\_iget(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- void nvshmemx\_TYPENAME\_iget_on_stream(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_iget(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_iget_block(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_iget_warp(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_igetSIZE(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- void nvshmemx\_igetSIZE_on_stream(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_igetSIZE(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_igetSIZE_block(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
- __device__ void nvshmemx\_igetSIZE_warp(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- dest [OUT]
Symmetric address of the array to be updated. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of the source array data object. The type of
source
should match that implied in the SYNOPSIS section.- dst [IN]
The stride between consecutive elements of the
dest
array. The stride is scaled by the element size of thedest
array. A value of1
indicates contiguous data.- sst [IN]
The stride between consecutive elements of the
source
array. The stride is scaled by the element size of thesource
array. A value of1
indicates contiguous data.- nelems [IN]
Number of elements in the
dest
andsource
arrays.- pe [IN]
PE number of the remote PE.
Description
The iget
routines provide a method for copying strided data elements
from a symmetric array from a specified remote PE to strided locations
on a local array. The routines return when the data has been copied into
the local dest
array.
Returns
None.
Nonblocking RMA¶
NVSHMEM_PUT_NBI¶
- void nvshmem\_TYPENAME\_put\_nbi(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- void nvshmemx\_TYPENAME\_put\_nbi_on_stream(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_put\_nbi(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_put\_nbi_block(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_put\_nbi_warp(TYPE *dest, const TYPE *source, size_t nelems, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_putSIZE\_nbi(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_putSIZE\_nbi_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putSIZE\_nbi(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putSIZE\_nbi_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putSIZE\_nbi_warp(void *dest, const void *source, size_t nelems, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- void nvshmem\_putmem\_nbi(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_putmem\_nbi_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_putmem\_nbi(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putmem\_nbi_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_putmem\_nbi_warp(void *dest, const void *source, size_t nelems, int pe)
- dest [OUT]
Symmetric address of the destination data object. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of the object containing the data to be copied. The type of
source
should match that implied in the SYNOPSIS section.- nelems [IN]
Number of elements in the
dest
andsource
arrays. Fornvshmem_putmem_nbi
andnvshmem_ctx_putmem_nbi
, elements are bytes.- pe [IN]
PE number of the remote PE.
Description
The routines return after initiating the operation. The operation is
considered complete after a subsequent call to nvshmem_quiet
. At the
completion of nvshmem_quiet
, the data has been copied into the
dest
array on the destination PE. The delivery of data words into
the data object on the destination PE may occur in any order.
Furthermore, two successive put routines may deliver data out of order
unless a call to nvshmem_fence
is introduced between the two calls.
Returns
None.
NVSHMEM_GET_NBI¶
- void nvshmem\_TYPENAME\_get\_nbi(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- void nvshmemx\_TYPENAME\_get\_nbi_on_stream(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_TYPENAME\_get\_nbi(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_get\_nbi_block(TYPE *dest, const TYPE *source, size_t nelems, int pe)
- __device__ void nvshmemx\_TYPENAME\_get\_nbi_warp(TYPE *dest, const TYPE *source, size_t nelems, int pe)
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Blocking RMA.
- void nvshmem\_getSIZE\_nbi(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_getSIZE\_nbi_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_getSIZE\_nbi(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getSIZE\_nbi_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getSIZE\_nbi_warp(void *dest, const void *source, size_t nelems, int pe)
where SIZE is one of 8, 16, 32, 64, 128
.
- void nvshmem\_getmem\_nbi(void *dest, const void *source, size_t nelems, int pe)
- void nvshmemx\_getmem\_nbi_on_stream(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)
- __device__ void nvshmem\_getmem\_nbi(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getmem\_nbi_block(void *dest, const void *source, size_t nelems, int pe)
- __device__ void nvshmemx\_getmem\_nbi_warp(void *dest, const void *source, size_t nelems, int pe)
- dest [OUT]
Symmetric address of the data object to be updated. The type of
dest
should match that implied in the SYNOPSIS section.- source [IN]
Symmetric address of the source data object. The type of
source
should match that implied in the SYNOPSIS section.- nelems [IN]
Number of elements in the
dest
andsource
arrays. Fornvshmem_getmem_nbi
andnvshmem_ctx_getmem_nbi
, elements are bytes.- pe [IN]
PE number of the remote PE.
Description
The get routines provide a method for copying a contiguous symmetric
data object from a different PE to a contiguous data object on the local
PE. The routines return after initiating the operation. The operation is
considered complete after a subsequent call to nvshmem_quiet
. At the
completion of nvshmem_quiet
, the data has been delivered to the
dest
array on the local PE.
Returns
None.
Notes
See Section Memory Model for a definition of the term remotely accessible.