NVSHMEM provides a set of APIs for managing the symmetric heap. The APIs allow one to dynamically allocate, deallocate, reallocate and align symmetric data objects in the symmetric heap.
NVSHMEM_MALLOC, NVSHMEM_FREE, NVSHMEM_ALIGN¶
nvshmem_align(size_t alignment, size_t size)¶
- size [IN]
- The size, in bytes, of a block to be allocated from the symmetric heap.
- ptr [IN]
- Symmetric address of an object in the symmetric heap.
- alignment [IN]
- Byte alignment of the block allocated from the symmetric heap.
are collective operations that require participation by all PEs.
nvshmem_malloc routine returns the symmetric address of a block
of at least
size bytes, which shall be suitably aligned so that it
may be assigned to a pointer to any type of object. This space is
allocated from the symmetric heap (in contrast to
allocates from the private heap). When
size is zero, the
nvshmem_malloc routine performs no action and returns a null
nvshmem_align routine allocates a block in the symmetric heap
that has a byte alignment specified by the
alignment argument. The
alignment shall be a multiple of
sizeof(void *) that is
also a power of two. Otherwise, the behavior is undefined. When
is zero, the
nvshmem_align routine performs no action and returns a
nvshmem_free routine causes the block to which
ptr points to
be deallocated, that is, made available for further allocation. If
ptr is a null pointer, no action is performed.
are provided so that multiple PEs in a program can allocate symmetric,
remotely accessible memory blocks. These memory blocks can then be used
with NVSHMEM communication routines. When no action is performed, these
routines return without performing a barrier. Otherwise, each of these
routines includes at least one call to a procedure that is semantically
nvshmem_align call a barrier on exit; and
nvshmem_free calls a
barrier on entry. This ensures that all PEs participate in the memory
allocation, and that the memory on other PEs can be used as soon as the
local PE returns. The user is also responsible for calling these
routines with identical argument(s) on all PEs; if differing
alignment arguments are used, the behavior of the call
and any subsequent NVSHMEM calls is undefined.
nvshmem_malloc routine returns the symmetric address of the
allocated space; otherwise, it returns a null pointer.
nvshmem_free routine returns no value.
nvshmem_align routine returns an aligned symmetric address whose
value is a multiple of
alignment; otherwise, it returns a null
NVSHMEM supports both dynamic and static symmetric heap allocation
policies. Dynamic symmetric heap allocation is accomplished using the
CUDA VMM APIs and is enabled by default. Setting
NVSHMEM_DISABLE_CUDA_VMM will cause NVSHMEM to use a statically
allocated symmetric heap. In this mode, the total size of the symmetric
heap is determined at job startup. One can specify the size of the heap
NVSHMEM_SYMMETRIC_SIZE environment variable (where
nvshmem_free routines differ from the
private heap allocation routines in that all PEs in a program must call
them (a barrier is used to ensure this).
nvshmem_calloc(size_t count, size_t size)¶
- count [IN]
- The number of elements to allocate.
- size [IN]
- The size in bytes of each element to allocate.
nvshmem_calloc routine is a collective operation on the world
team that allocates a region of remotely-accessible memory for an array
count objects of
size bytes each and returns a pointer to the
lowest byte address of the allocated symmetric memory. The space is
initialized to all bits zero.
If the allocation succeeds, the pointer returned shall be suitably
aligned so that it may be assigned to a pointer to any type of object.
If the allocation does not succeed, or either
0, the return value is a null pointer.
The values for
size shall each be equal across all PEs
nvshmem_calloc; otherwise, the behavior is undefined.
returns without performing a barrier. Otherwise, this routine calls a
procedure that is semantically equivalent to
nvshmem_calloc routine returns a pointer to the lowest byte
address of the allocated space; otherwise, it returns a null pointer.
nvshmemx_buffer_register(void *addr, size_t length)¶
- addr [IN]
- The address at the start of the buffer to register.
- length [IN]
- The length of the registration.
The nvshmemx_buffer_register function registers the buffer with the remote transport (either UCX or IB) and with CUDA if the address supplied is unregistered host memory. NVSHMEM heap memory, i.e. any memory allocated with nvshmem_malloc or nvshmem_calloc, should not be registered using this function as the heap is registered by default. In fact, attempting to register nvshmem heap memory with this function will result in an error. Once memory is registered with this function, it is possible to use that memory as the local operand in any host or device AMO or RMA operations. Memory should be registered in as large of chunks as possible for maximum performance in the I/O path.
Limitations Using memory registered with this function as the remote operand in any nvshmem API is unsupported.
Sending a buffer that is spread over multiple registrations to a single nvshmem RMA operation is not supported.
Registering CUDA managed memory with this API is not supported.
Freeing registered memory before unregistering it is unsupported and will result in undefined behavior.
Returns 0 on success. NVSHMEMX_ERROR_INVALID_VALUE if the buffer is from CUDA managed memory, the NVSHMEM heap, or inside of another registered buffer region. NVSHMEMX_ERROR_OUT_OF_MEMORY for errors in allocating control structures. NVSHMEMX_ERROR_INTERNAL for internal registration errors.
- addr [IN]
- The Address to be unregistered. Note: This must be the same address used when calling nvshmemx_buffer_register. Passing a value in the middle of a registered buffer will result in the buffer not being unregistered.
The nvshmemx_buffer_unregister operation unregisters a buffer previously registered with nvshmem_buffer_register.
0 on success. NVSHMEMX_ERROR_INVALID_VALUE if the buffer could not be found.