Remote Memory Access¶
The RMA routines described in this section can be used to perform reads from and writes to symmetric data objects. These operations are one-sided, meaning that the PE invoking an operation provides all communication parameters and the targeted PE is passive. A characteristic of one-sided communication is that it decouples communication from synchronization. One-sided communication mechanisms transfer data; however, they do not synchronize the sender of the data with the receiver of the data.
NVSHMEMRMA routines are performed on symmetric data objects. The initiator PE of a call is designated as the origin PE and the PE targeted by an operation is designated as the destination PE. The source and dest designators refer to the data objects that an operation reads from and writes to. In the case of the remote update routine, Put, the origin PE provides the source data object and the destination PE provides the dest data object. In the case of the remote read routine, Get, the origin PE provides the dest data object and the destination PE provides the source data object.
The standard RMA types include the exact-width integer types defined in stdint.h by C §7.18.1.1 and C §7.20.1.1. When the C translation environment does not provide exact-width integer types with stdint.h, an NVSHMEM implemementation is not required to provide support for these types.
TYPE | TYPENAME |
---|---|
float | float |
double | double |
char | char |
signed char | schar |
short | short |
int | int |
long | long |
long long | longlong |
unsigned char | uchar |
unsigned short | ushort |
unsigned int | uint |
unsigned long | ulong |
unsigned long long | ulonglong |
int8_t | int8 |
int16_t | int16 |
int32_t | int32 |
int64_t | int64 |
uint8_t | uint8 |
uint16_t | uint16 |
uint32_t | uint32 |
uint64_t | uint64 |
size_t | size |
ptrdiff_t | ptrdiff |
Blocking RMA¶
NVSHMEM_PUT¶
-
void
nvshmem_TYPENAME_put
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
void
nvshmemx_TYPENAME_put_on_stream
(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_put
(TYPE *dest, const TYPE *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_TYPENAME_put_block
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_put_warp
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
-
void
nvshmem_putSIZE
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_putSIZE_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_putSIZE
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_putSIZE_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_putSIZE_warp
(void *dest, const void *source, size_t nelems, int pe)¶
where SIZE is one of 8, 16, 32, 64, 128.
-
void
nvshmem_putmem
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_putmem_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_putmem
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_putmem_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_putmem_warp
(void *dest, const void *source, size_t nelems, int pe)¶
- dest [OUT]
- Symmetric address of the destination data object. The type of dest should match that implied in the SYNOPSIS section.
- source [IN]
- Local address of the data object containing the data to be copied. The type of source should match that implied in the SYNOPSIS section.
- nelems [IN]
- Number of elements in the dest and source arrays. For nvshmem_putmem and shmem_ctx_putmem, elements are bytes.
- pe [IN]
- PE number of the remote PE.
Description
The routines return after the data has been copied out of the source array on the local PE. The delivery of data words into the data object on the destination PE may occur in any order. Furthermore, two successive put routines may deliver data out of order unless a call to nvshmem_fence is introduced between the two calls.
Returns
None.
NVSHMEM_P¶
-
void
nvshmem_TYPENAME_p
(TYPE *dest, TYPE value, int pe)¶
-
void
nvshmemx_TYPENAME_p_on_stream
(TYPE *dest, TYPE value, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_p
(TYPE *dest, TYPE value, int pe)
-
__device__ void
nvshmemx_TYPENAME_p_block
(TYPE *dest, TYPE value, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_p_warp
(TYPE *dest, TYPE value, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
- dest [OUT]
- Symmetric address of the destination data object. The type of dest should match that implied in the SYNOPSIS section.
- value [IN]
- The value to be transferred to dest. The type of value should match that implied in the SYNOPSIS section.
- pe [IN]
- The number of the remote PE.
Description
These routines provide a very low latency put capability for single elements of most basic types.
As with nvshmem_put, these routines start the remote transfer and may return before the data is delivered to the remote PE. Use nvshmem_quiet to force completion of all remote Put transfers.
Returns
None.
NVSHMEM_IPUT¶
-
void
nvshmem_TYPENAME_iput
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
void
nvshmemx_TYPENAME_iput_on_stream
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_iput
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
-
__device__ void
nvshmemx_TYPENAME_iput_block
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_iput_warp
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
-
void
nvshmem_iputSIZE
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
void
nvshmemx_iputSIZE_on_stream
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_iputSIZE
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
-
__device__ void
nvshmemx_iputSIZE_block
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_iputSIZE_warp
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
where SIZE is one of 8, 16, 32, 64, 128.
- dest [OUT]
- Symmetric address of the destination array data object. The type of dest should match that implied in the SYNOPSIS section.
- source [IN]
- Local address of the array containing the data to be copied. The type of source should match that implied in the SYNOPSIS section.
- dst [IN]
- The stride between consecutive elements of the dest array. The stride is scaled by the element size of the dest array. A value of 1 indicates contiguous data.
- sst [IN]
- The stride between consecutive elements of the source array. The stride is scaled by the element size of the source array. A value of 1 indicates contiguous data.
- nelems [IN]
- Number of elements in the dest and source arrays.
- pe [IN]
- PE number of the remote PE.
Description
The iput routines provide a method for copying strided data elements (specified by sst) of an array from a source array on the local PE to locations specified by stride dst on a dest array on specified remote PE. Both strides, dst and sst, must be greater than or equal to 1. The routines return when the data has been copied out of the source array on the local PE but not necessarily before the data has been delivered to the remote data object.
Returns
None.
Notes
See Section Memory Model for a definition of the term remotely accessible.
NVSHMEM_GET¶
-
void
nvshmem_TYPENAME_get
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
void
nvshmemx_TYPENAME_get_on_stream
(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_get
(TYPE *dest, const TYPE *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_TYPENAME_get_block
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_get_warp
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
-
void
nvshmem_getSIZE
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_getSIZE_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_getSIZE
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_getSIZE_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_getSIZE_warp
(void *dest, const void *source, size_t nelems, int pe)¶
where SIZE is one of 8, 16, 32, 64, 128.
-
void
nvshmem_getmem
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_getmem_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_getmem
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_getmem_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_getmem_warp
(void *dest, const void *source, size_t nelems, int pe)¶
- dest [OUT]
- Local address of the data object to be updated. The type of dest should match that implied in the SYNOPSIS section.
- source [IN]
- Symmetric address of the source data object. The type of source should match that implied in the SYNOPSIS section.
- nelems [IN]
- Number of elements in the dest and source arrays. For nvshmem_getmem and shmem_ctx_getmem, elements are bytes.
- pe [IN]
- PE number of the remote PE.
Description
The get routines provide a method for copying a contiguous symmetric data object from a different PE to a contiguous data object on the local PE. The routines return after the data has been delivered to the dest array on the local PE.
Returns
None.
Notes
See Section Memory Model for a definition of the term remotely accessible.
NVSHMEM_G¶
-
TYPE
nvshmem_TYPENAME_g
(const TYPE *source, int pe)¶
-
TYPE
nvshmemx_TYPENAME_g_on_stream
(const TYPE *source, int pe, cudaStream_t stream)¶
-
__device__ TYPE
nvshmem_TYPENAME_g
(const TYPE *source, int pe)
-
__device__ TYPE
nvshmemx_TYPENAME_g_block
(const TYPE *source, int pe)¶
-
__device__ TYPE
nvshmemx_TYPENAME_g_warp
(const TYPE *source, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
- source [IN]
- Symmetric address of the source data object. The type of source should match that implied in the SYNOPSIS section.
- pe [IN]
- The number of the remote PE on which source resides.
Description
These routines provide a very low latency get capability for single elements of most basic types.
Returns
Returns a single element of type specified in the synopsis.
NVSHMEM_IGET¶
-
void
nvshmem_TYPENAME_iget
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
void
nvshmemx_TYPENAME_iget_on_stream
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_iget
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
-
__device__ void
nvshmemx_TYPENAME_iget_block
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_iget_warp
(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
-
void
nvshmem_igetSIZE
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
void
nvshmemx_igetSIZE_on_stream
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_igetSIZE
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)
-
__device__ void
nvshmemx_igetSIZE_block
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_igetSIZE_warp
(void *dest, const void *source, ptrdiff_t dst, ptrdiff_t sst, size_t nelems, int pe)¶
where SIZE is one of 8, 16, 32, 64, 128.
- dest [OUT]
- Local address of the array to be updated. The type of dest should match that implied in the SYNOPSIS section.
- source [IN]
- Symmetric address of the source array data object. The type of source should match that implied in the SYNOPSIS section.
- dst [IN]
- The stride between consecutive elements of the dest array. The stride is scaled by the element size of the dest array. A value of 1 indicates contiguous data.
- sst [IN]
- The stride between consecutive elements of the source array. The stride is scaled by the element size of the source array. A value of 1 indicates contiguous data.
- nelems [IN]
- Number of elements in the dest and source arrays.
- pe [IN]
- PE number of the remote PE.
Description
The iget routines provide a method for copying strided data elements from a symmetric array from a specified remote PE to strided locations on a local array. The routines return when the data has been copied into the local dest array.
Returns
None.
Nonblocking RMA¶
NVSHMEM_PUT_NBI¶
-
void
nvshmem_TYPENAME_put_nbi
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
void
nvshmemx_TYPENAME_put_nbi_on_stream
(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_put_nbi
(TYPE *dest, const TYPE *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_TYPENAME_put_nbi_block
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_put_nbi_warp
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
-
void
nvshmem_putSIZE_nbi
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_putSIZE_nbi_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_putSIZE_nbi
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_putSIZE_nbi_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_putSIZE_nbi_warp
(void *dest, const void *source, size_t nelems, int pe)¶
where SIZE is one of 8, 16, 32, 64, 128.
-
void
nvshmem_putmem_nbi
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_putmem_nbi_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_putmem_nbi
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_putmem_nbi_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_putmem_nbi_warp
(void *dest, const void *source, size_t nelems, int pe)¶
- dest [OUT]
- Symmetric address of the destination data object. The type of dest should match that implied in the SYNOPSIS section.
- source [IN]
- Local address of the object containing the data to be copied. The type of source should match that implied in the SYNOPSIS section.
- nelems [IN]
- Number of elements in the dest and source arrays. For nvshmem_putmem_nbi and shmem_ctx_putmem_nbi, elements are bytes.
- pe [IN]
- PE number of the remote PE.
Description
The routines return after initiating the operation. The operation is considered complete after a subsequent call to nvshmem_quiet. At the completion of nvshmem_quiet, the data has been copied into the dest array on the destination PE. The delivery of data words into the data object on the destination PE may occur in any order. Furthermore, two successive put routines may deliver data out of order unless a call to nvshmem_fence is introduced between the two calls.
Returns
None.
NVSHMEM_GET_NBI¶
-
void
nvshmem_TYPENAME_get_nbi
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
void
nvshmemx_TYPENAME_get_nbi_on_stream
(TYPE *dest, const TYPE *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_TYPENAME_get_nbi
(TYPE *dest, const TYPE *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_TYPENAME_get_nbi_block
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_TYPENAME_get_nbi_warp
(TYPE *dest, const TYPE *source, size_t nelems, int pe)¶
where TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types and Names.
-
void
nvshmem_getSIZE_nbi
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_getSIZE_nbi_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_getSIZE_nbi
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_getSIZE_nbi_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_getSIZE_nbi_warp
(void *dest, const void *source, size_t nelems, int pe)¶
where SIZE is one of 8, 16, 32, 64, 128.
-
void
nvshmem_getmem_nbi
(void *dest, const void *source, size_t nelems, int pe)¶
-
void
nvshmemx_getmem_nbi_on_stream
(void *dest, const void *source, size_t nelems, int pe, cudaStream_t stream)¶
-
__device__ void
nvshmem_getmem_nbi
(void *dest, const void *source, size_t nelems, int pe)
-
__device__ void
nvshmemx_getmem_nbi_block
(void *dest, const void *source, size_t nelems, int pe)¶
-
__device__ void
nvshmemx_getmem_nbi_warp
(void *dest, const void *source, size_t nelems, int pe)¶
- dest [OUT]
- Local address of the data object to be updated. The type of dest should match that implied in the SYNOPSIS section.
- source [IN]
- Symmetric address of the source data object. The type of source should match that implied in the SYNOPSIS section.
- nelems [IN]
- Number of elements in the dest and source arrays. For nvshmem_getmem_nbi and nvshmem_ctx_getmem_nbi, elements are bytes.
- pe [IN]
- PE number of the remote PE.
Description
The get routines provide a method for copying a contiguous symmetric data object from a different PE to a contiguous data object on the local PE. The routines return after initiating the operation. The operation is considered complete after a subsequent call to nvshmem_quiet. At the completion of nvshmem_quiet, the data has been delivered to the dest array on the local PE.
Returns
None.
Notes
See Section Memory Model for a definition of the term remotely accessible.