NVSHMEM Initialization

This section describes some key NVSHMEM initialization considerations when developing applications using NVSHMEM runtime.

Two-Stage Initialization

This method that allows users to select a CUDA device after calling the nvshmem_init, the nvshmemx_init_attr, or the nvshmemx_hostlib_init_attr NVSHMEM initialization APIs. The method is used when the application wants to assign a CUDA device based on the PE that was returned from nvshmem_my_pe, and this API can only be called after calling the nvshmem_init or nvshmemx_init_attr APIs. Here is an example of the correct usage:

nvshmemx_init_attr(..., &attr);
// predefined_team can either NVSHMEM_TEAM_WORLD or NVSHMEMX_TEAM_NODE
int mype = [nvshmem_my_pe(), nvshmem_n_pes(), nvshmem_team_my_pe([predefined_team]];
cudaSetDevice(mype);
nvshmem(x)_*(...)

In a two-stage initialization, the actual initialization is deferred to the first call after the device is set. After the NVSHMEM initialization API returns, there are only a few APIs (listed above example) which might be called without the library attempting to reinitialize. The library would attempt to reinitialize NVSHMEM in any of the host and on-stream collective APIs like nvshmem_malloc, nvshmem_calloc, nvshmem_align, nvshmem_barrier_all, nvshmem_sync_all, nvshmem_barrier_all_on_stream, or nvshmem_sync_all_on_stream. We recommended that you do invoke any other API only after NVSHMEM initialization has completed. To know the status of NVSHMEM initialization, the application can using nvshmemx_init_status.