NVSHMEM Initialization#
This section describes some key NVSHMEM initialization considerations when developing applications using NVSHMEM runtime.
Two-Stage Initialization#
This method that allows users to select a CUDA device after calling the nvshmem_init
, the nvshmemx_init_attr
, or the nvshmemx_hostlib_init_attr
NVSHMEM initialization APIs. The method is used when the application wants to assign a CUDA device based on the PE that was returned from nvshmem_my_pe
, and this API can only be called after calling the nvshmem_init
or nvshmemx_init_attr
APIs. Here is an example of the correct usage:
nvshmemx_init_attr(..., &attr);
// predefined_team can either NVSHMEM_TEAM_WORLD or NVSHMEMX_TEAM_NODE
int mype = [nvshmem_my_pe(), nvshmem_n_pes(), nvshmem_team_my_pe([predefined_team]];
cudaSetDevice(mype);
nvshmem(x)_*(...)
In a two-stage initialization, the actual initialization is deferred to the first call after the device is set. After the NVSHMEM initialization API returns, there are only a few APIs (listed above example) which might be called without the library attempting to reinitialize. The library would attempt to reinitialize NVSHMEM in any of the host and on-stream collective APIs like nvshmem_malloc
, nvshmem_calloc
, nvshmem_align
, nvshmem_barrier_all
, nvshmem_sync_all
, nvshmem_barrier_all_on_stream
, or nvshmem_sync_all_on_stream
. We recommended that you do invoke any other API only after NVSHMEM initialization has completed. To know the status of NVSHMEM initialization, the application can using nvshmemx_init_status.