Library Setup, Exit, and Query¶
The library setup and query interfaces that initialize and monitor the parallel environment of the PEs.
NVSHMEM_INIT¶
- void nvshmem\_init(void)
Description
nvshmem_init
allocates and initializes resources used by the NVSHMEM
library. It is a collective operation that all PEs must call before any
other NVSHMEM routine may be called. At the end of the NVSHMEM program
which it initialized, the call to nvshmem_init
must be matched with
a call to nvshmem_finalize
. After the first call to
nvshmem_init
, a subsequent call to nvshmem_init
or
nvshmem_init_thread
in the same program results in undefined
behavior. If NVSHMEM fails to initialize, the program will exit with a
non-zero value.
Returns
None.
NVSHMEMX_INIT_ATTR¶
-
int nvshmemx_init_attr(unsigned int flags, nvshmemx_init_attr_t *attributes)¶
- flags [IN]
Bitwise OR of operation flags. A value of 0 indicates an initialization that is similar to when
nvshmem_init
is used.- attributes [IN]
Additional attributes to be used when initializing the NVSHMEM library.
Description
NVSHMEM provides the nvshmem_init
function, as defined in the OpenSHMEM 1.3
specification. In addition, the nvshmemx_init_attr
function is provided to
support the easy porting of MPI and OpenSHMEM programs to NVSHMEM. It also provides support
for launcher agnostic bootstrapping, as described below. It allows
the initialization of NVSHMEM based on an existing MPI communicator or OpenSHMEM
job. This is useful when an application is written to use NVSHMEM in a node
and MPI across nodes, or when an application uses another OpenSHMEM
implementation to manage communication across a symmetric heap in the system
memory. It allows the initialization of NVSHMEM based on a Unique Identifier (UID) that utilizes IP-based networking.
This removes a dependency on a specific job launch mechanism and instead uses IP-based networking directly to bootstrap the NVSHMEM application. UID initialization is useful in applications that require portable application startup without dependency on process management infrastructure. However, it does require applications to distribute a UID token prior to calling the NVSHMEM initialization function.
The nvshmemx_init_attr
function initializes the NVSHMEM library by allocating
the resources that are used by the library and assigning a unique identifier to each PE.
This collective operation should be called by all PEs before any other NVSHMEM
routine.
The flags argument can be set to 0 or one of the following values:
NVSHMEMX_INIT_WITH_MPI_COMM - This flag is used to specify that an MPI communicator is provided by the user.
NVSHMEMX_INIT_WITH_SHMEM - This flag is used to specify that NVSHMEM is used inside an OpenSHMEM job.
NVSHMEMX_INIT_WITH_UNIQUEID - This flag is used to specify that Unique ID arguments are provided by the user.
With the NVSHMEMX_INIT_WITH_MPI_COMM
option, the NVSHMEM library is initialized
based on the MPI communicator that is provided with each rank in the MPI
communicator that participates as an NVSHMEM PE. A call to nvshmem_finalize
is
required before the MPI communicator is destroyed.
Note: Do not make any calls to NVSHMEM routines after the MPI communicator has been destroyed.
With the NVSHMEMX_INIT_WITH_SHMEM
option, the NVSHMEM library is initialized
based on the OpenSHMEM PE underlying each NVSHMEM PE. A call to
nvshmem_finalize
is required before shmem_finalize
is called. Do not make
any calls to NVSHMEM routines after nvshmem_finalize
is called.
With the NVSHMEMX_INIT_WITH_UNIQUEID
option, the NVSHMEM library is initialized
based on the Unique ID arguments that are provided by each PE in its attribute argument.
The attributes argument is a pointer to a structure type that contains the following fields:
- int version
A scalar representing the version of the struct. Filled out by the
NVSHMEMX_INIT_ATTR_INITIALIZER
macro This is not to be modified directly.- void *mpi_comm
A pointer to the MPI communicator handle that will be used as the NVSHMEM world team.
- nvshmemx_init_args_t init_args
A union of initializer arguments per bootstrap, including Unique ID arguments.
It is recommended, but not necessary to statically initialize the attribute struct using the
NVSHMEMX_INIT_ATTR_INITIALIZER
macro befor calling this function. The optional nature of the
initializer may change in future major releases.
Returns
Returns 0 on success or an error code on failure.
Example
NVSHMEMX_HOSTLIB_INIT_ATTR¶
-
int nvshmemx_hostlib_init_attr(unsigned int flags, nvshmemx_init_attr_t *attributes)¶
- flags [IN]
Bitwise OR of operation flags. A value of 0 indicates an initialization that is similar to when
nvshmem_init
is used.- attributes [IN]
Additional attributes to be used when initializing the NVSHMEM library.
Description
The nvshmemx_hostlib_init_attr
function differs from the nvshmemx_init_attr
function in two ways:
It only initializes the host library, enabling both host, and on-stream APIs. This function allows
applications to not statically link
libnvshmem.a
. Applications can dynamically load the host library (usingdlopen
) to enable host and on-stream APIs. Device APIs can be used through cubins initialized by thenvshmemx_cumodule_init
API.
It is REQUIRED that the
init_attr
struct be initialized with theNVSHMEMX_INIT_ATTR_INITIALIZER
macro
before calling this API. Not doing so will result in undefined behavior.
Please see the documentation of nvshmemx_init_attr
for additional information on the functionality of this API.
NVSHMEMX_HOSTLIB_FINALIZE¶
-
int nvshmemx_hostlib_finalize()¶
Description
The nvshmemx_hostlib_finalize
function differs from the nvshmem_finalize
function such that it only finalizes
the host library, disabling both host, and on-stream APIs. Applications can dynamically unload the host library
(using dlclose
) to disable host and on-stream APIs. Device APIs can be unloaded through cubins finalized by
the nvshmemx_cumodule_finalize
API as long as it is invoked before nvshmemx_hostlib_finalize
i.e the ffollowing order
of invoking the APIs nvshmemx_hostlib_init_attr
, nvshmemx_cumodule_init
, nvshmemx_hostlib_finalize
, nvshmemx_cumodule_finalize
can lead to undefined behavior.
Please see the documentation of nvshmem_finalize
for additional information on the functionality of this API.
NVSHMEMX_GET_UNIQUE_ID¶
-
int nvshmemx_get_uniqueid(nvshmemx_uniqueid_t *id)¶
- id [OUT]
A pointer to the Unique ID object to be populated by the NVSHMEM runtime. The object must be initialized with the NVSHMEMX_UNIQUEID_INITIALIZER prior to calling this function.
Description
This method queries the Unique ID (UID); stores it in an opaque, serializable structure; and returns its content to the user. The value of this structure
will be consumed by nvshmemx_init_attr
and must be set using the nvshmemx_set_attr_uniqueid_args
function, as described below.
Returns
Returns 0 on success, nonzero otherwise.
NVSHMEMX_SET_ATTR_UNIQUEID_ARGS¶
-
void nvshmemx_set_attr_uniqueid_args(int rank, int nranks, nvshmemx_uniqueid_t *id, nvshmemx_init_attr_t *attr)¶
- rank [IN]
My PE identifier in the job
- nranks [IN]
Total number of PEs in the job
- id [IN]
A pointer to the Network Socket Unique ID, retrieved using prior
nvshmemx_get_uniqueid
call- attr [OUT]
A pointer to the NVSHMEM initialization attribute structure of type
nvshmemx_init_attr_t
Description
This method provides the convenience functionality to set or populate Unique ID initialization specific arguments of the nvshmemx_init_attr_t
structure. The nvshmemx_init_attr_t
structure must be statically initialized using the NVSHMEMX_INIT_ATTR_INITIALIZER
macro before calling this
function.
Returns
None
Example
NVSHMEMX_CUMODULE_INIT¶
-
int nvshmemx_cumodule_init(CUmodule module)¶
- module [IN]
CUDA module to initialize for use with NVSHMEM.
Description
The nvshmemx_cumodule_init
function intializes the device state in
module
so that it’s able to perform NVSHMEM operations. The NVSHMEM
library must have completed device initialization prior to calling this
function. See nvshmemx_init_status for additional information on
determining when device initialization has completed.
Returns
Returns 0 on success or an error code on failure.
NVSHMEMX_INIT_STATUS¶
-
int nvshmemx_init_status(void)¶
Description
The nvshmemx_init_status
function can be used to check if the NVSHMEM
library is currently initialized and ready to perform NVSHMEM operations.
NVSHMEM initialization is performed in two phases: bootstrapping and device
initialization. If the CUDA device has not been set prior to calling an NVSHMEM
initialization routine, NVSHMEM will perform only bootstrapping and delay
device initialization until a subsequent collective operation is performed
(i.e. nvshmem_malloc
, nvshmem_calloc
, nvshmem_align
,
nvshmem_barrier_all
, nvshmem_sync_all
,
nvshmemx_barrier_all_on_stream
, or nvshmemx_sync_all_on_stream
).
Similarly, when nvshmem_finalize
is called, the NVSHMEM library will
finalize the NVSHMEM state associated with the CUDA device and return to the
bootstrapped state.
When NVSHMEM is in the bootstrapped mode, PE query routines can be used to
assist with CUDA device selection (e.g. nvshmem_my_pe
, nvshmem_n_pes
,
nvshmemx_team_my_pe(NVSHMEMX_TEAM_NODE)
,
nvshmemx_team_n_pes(NVSHMEMX_TEAM_NODE)
). The CUDA device must be set prior
to performing any other NVSHMEM operations, as they will trigger NVSHMEM to
perform device initialization. No device functions (including PE query) can be
performed until NVSHMEM has completed device initialization.
When the application consists of multiple dynamically linked components that
each use NVSHMEM, the call to nvshmemx_init_status
will indicate the status
of the calling component. For example, consider an NVSHMEM application links
with libfoo.so
, which also uses NVSHMEM. When the application has fully
initialized NVSHMEM the nvshmemx_init_status
call from the application’s
code might return NVSHMEM_STATUS_IS_INITIALIZED
. However, if libfoo.so
calls nvshmemx_init_status
prior to performing initialization it would see
NVSHMEM_IS_BOOTSTRAPPED
because the device state of libfoo.so
has not
yet been initialized.
Returns
One of the following constants is returned. These constants have increasing
values in the order shown and NVSHMEM_STATUS_NOT_INITIALIZED
has the value
0
.
NVSHMEM_STATUS_NOT_INITIALIZED
– NVSHMEM has not been initialized.NVSHMEM_STATUS_IS_BOOTSTRAPPED
– NVSHMEM is bootstrapped, but has not completed device initialization.NVSHMEM_STATUS_IS_INITIALIZED
– NVSHMEM has completed device initialization and the PE has exclusive use of the assigned CUDA device.NVSHMEM_STATUS_LIMITED_MPG
– NVSHMEM has completed device initialization and the PE has limited, shared used of the assigned CUDA device. See Multiprocess GPU Support for more information.NVSHMEM_STATUS_FULL_MPG
– NVSHMEM has completed device initialization and the PE has full, shared used of the assigned CUDA device. See Multiprocess GPU Support for more information.
NVSHMEM_MY_PE¶
- int nvshmem\_my\_pe(void)
- __device__ int nvshmem\_my\_pe(void)
Description
This routine returns the PE number of the calling PE. It accepts no
arguments. The result is an integer between 0
and npes
- 1
,
where npes
is the total number of PEs executing the current program.
Returns
Integer - Between 0
and npes
- 1
NVSHMEM_N_PES¶
- int nvshmem\_n\_pes(void)
- __device__ int nvshmem\_n\_pes(void)
Description
The routine returns the number of PEs running in the program.
Returns
Integer - Number of PEs running in the NVSHMEM program.
NVSHMEM_FINALIZE¶
- void nvshmem\_finalize(void)
Description
nvshmem_finalize
is a collective operation that ends the NVSHMEM
portion of a program previously initialized by nvshmem_init
or
nvshmem_init_thread
and releases all resources used by the NVSHMEM
library. This collective operation requires all PEs to participate in
the call. There is an implicit global barrier in nvshmem_finalize
to
ensure that pending communications are completed and that no resources
are released until all PEs have entered nvshmem_finalize
.
nvshmem_finalize
must be the last NVSHMEM library call encountered
in the NVSHMEM portion of a program. A call to nvshmem_finalize
will
release all resources initialized by a corresponding call to
nvshmem_init
or nvshmem_init_thread
. All processes that
represent the PEs will still exist after the call to
nvshmem_finalize
returns, but they will no longer have access to
resources that have been released.
Returns
None.
Notes
nvshmem_finalize
releases all resources used by the NVSHMEM library
including the symmetric memory heap and pointers initiated by
nvshmem_ptr
. This collective operation requires all PEs to
participate in the call, not just a subset of the PEs. The non-NVSHMEM
portion of a program may continue after a call to nvshmem_finalize
by all PEs.
NVSHMEM_GLOBAL_EXIT¶
- void nvshmem\_global\_exit(int status)
- __device__ void nvshmem\_global\_exit(int status)
- status [IN]
The exit status from the main program.
Description
nvshmem_global_exit
is a non-collective routine that allows any one
PE to force termination of an NVSHMEM program for all PEs, passing an
exit status to the execution environment. This routine terminates the
entire program, not just the NVSHMEM portion. When any PE calls
nvshmem_global_exit
, it results in the immediate notification to all
PEs to terminate. nvshmem_global_exit
flushes I/O and releases
resources in accordance with C/C++ language requirements for normal
program termination. If more than one PE calls nvshmem_global_exit
,
then the exit status returned to the environment shall be one of the
values passed to nvshmem_global_exit
as the status argument. There
is no return to the caller of nvshmem_global_exit
; control is
returned from the NVSHMEM program to the execution environment for all
PEs.
Returns
None.
Notes
nvshmem_global_exit
may be used in situations where one or more PEs
have determined that the program has completed and/or should terminate
early. Accordingly, the integer status argument can be used to pass any
information about the nature of the exit; e.g., that the program
encountered an error or found a solution. Since nvshmem_global_exit
is a non-collective routine, there is no implied synchronization, and
all PEs must terminate regardless of their current execution state.
While I/O must be flushed for standard language I/O calls from C/C++,
it is implementation dependent as to how I/O done by other means (e.g.,
third party I/O libraries) is handled. Similarly, resources are released
according to C/C++ standard language requirements, but this may not
include all resources allocated for the NVSHMEM program. However, a
quality implementation will make a best effort to flush all I/O and
clean up all resources.
NVSHMEM_PTR¶
- void *nvshmem\_ptr(const void *dest, int pe)
- __device__ void *nvshmem\_ptr(const void *dest, int pe)
- dest [IN]
The symmetric address of the remotely accessible data object to be referenced.
- pe [IN]
An integer that indicates the PE number on which
dest
is to be accessed.
Description
nvshmem_ptr
returns an address that may be used to directly
reference dest
on the specified PE. This address can be assigned to
a pointer. After that, ordinary loads and stores to dest
may be
performed. The address returned by nvshmem_ptr
is a local address to
a remotely accessible data object. Providing this address to an argument
of an NVSHMEM routine that requires a symmetric address results in
undefined behavior.
The nvshmem_ptr
routine can provide an efficient means to accomplish
communication, for example when a sequence of reads and writes to a data
object on a remote PE does not match the access pattern provided in an
NVSHMEM data transfer routine like nvshmem_put
or nvshmem_iget
.
Returns
A local pointer to the remotely accessible dest
data object is
returned when it can be accessed using memory loads and stores.
Otherwise, a null pointer is returned.
Notes
When calling nvshmem_ptr
, dest
is the address of the referenced
symmetric data object on the calling PE.
NVSHMEM_INFO_GET_VERSION¶
- void nvshmem\_info\_get\_version(int *major, int *minor)
- __device__ void nvshmem\_info\_get\_version(int *major, int *minor)
- major [OUT]
The major version of the OpenSHMEM Specification in use.
- minor [OUT]
The minor version of the OpenSHMEM Specification in use.
Description
This routine returns the major and minor version of the OpenSHMEM
Specification in use. For a given library implementation, the major and
minor version returned by these calls are consistent with the library
constants NVSHMEM_MAJOR_VERSION
and NVSHMEM_MINOR_VERSION
.
Returns
None.
NVSHMEM_INFO_GET_NAME¶
- void nvshmem\_info\_get\_name(char *name)
- __device__ void nvshmem\_info\_get\_name(char *name)
- name [OUT]
The vendor defined string.
Description
This routine returns the vendor defined name string of size defined by
the library constant NVSHMEM_MAX_NAME_LEN
. The program calling this
function provides the name
memory buffer of at least size
NVSHMEM_MAX_NAME_LEN
. The implementation copies the vendor defined
string of size at most NVSHMEM_MAX_NAME_LEN
to name
. In C/C++,
the string is terminated by a null character. If the name
memory
buffer is provided with size less than NVSHMEM_MAX_NAME_LEN
,
behavior is undefined. For a given library implementation, the vendor
string returned is consistent with the library constant
NVSHMEM_VENDOR_STRING
.
Returns
None.
NVSHMEMX_VENDOR_GET_VERSION_INFO¶
-
void nvshmemx_vendor_get_version_info(int *major, int *minor, int *patch)¶
- major [OUT]
The major version of the NVSHMEM in use
- minor [OUT]
The minor version of the NVSHMEM in use
- patch [OUT]
The patch version of the NVSHMEM in use
Description
The nvshmemx_vendor_get_version_info
function returns the NVSHMEM major, minor, patch
version in use. For a given library implementation, the major, minor and patch version returned by these
calls at runtime can be same or different from the constants NVSHMEM_VENDOR_MAJOR_VERSION
, NVSHMEM_VENDOR_MINOR_VERSION
,
and NVSHMEM_VENDOR_PATCH_VERSION
present at compile-time. The difference may arise, if an application
links against version of the library at runtime, while compiling against a different version of the headers containing
these constants.
Returns
None.
Thread Support¶
This section specifies the interaction between the NVSHMEM interfaces and user threads. It also describes the routines that can be used for initializing and querying the thread environment. There are four levels of threading defined by the OpenSHMEM specification.
NVSHMEM_THREAD_SINGLE
The NVSHMEM program must not be multithreaded.
NVSHMEM_THREAD_FUNNELED
The NVSHMEM program may be multithreaded. However, the program must ensure that only the main thread invokes the NVSHMEM interfaces. The main thread is the thread that invokes either
nvshmem_init
ornvshmem_init_thread
.NVSHMEM_THREAD_SERIALIZED
The NVSHMEM program may be multithreaded. However, the program must ensure that the NVSHMEM interfaces are not invoked concurrently by multiple threads.
NVSHMEM_THREAD_MULTIPLE
The NVSHMEM program may be multithreaded and any thread may invoke the NVSHMEM interfaces.
The thread level constants must have increasing integer values; i.e.,
NVSHMEM_THREAD_SINGLE
< NVSHMEM_THREAD_FUNNELED
<
NVSHMEM_THREAD_SERIALIZED
< NVSHMEM_THREAD_MULTIPLE
. The
following semantics apply to the usage of these models:
In the
NVSHMEM_THREAD_FUNNELED
,NVSHMEM_THREAD_SERIALIZED
, andNVSHMEM_THREAD_MULTIPLE
thread levels, thenvshmem_init
andnvshmem_finalize
calls must be invoked by the same thread.Any NVSHMEM operation initiated by a thread is considered an action of the PE as a whole. The symmetric heap and symmetric variables scope are not impacted by multiple threads invoking the NVSHMEM interfaces. Each PE has a symmetric heap that is shared by all threads within that PE. For example, a thread invoking a memory allocation routine such as
nvshmem_malloc
allocates memory that is accessible by all threads of the PE. The requirement that the same symmetric heap operations must be executed by all PEs in the same order also applies in a threaded environment. Similarly, the completion of collective operations is not impacted by multiple threads. For example,nvshmem_barrier_all
is completed when all PEs enter and exit thenvshmem_barrier_all
call, even though only one thread in the PE is participating in the collective call.Blocking NVSHMEM calls will only block the calling thread, allowing other threads, if available, to continue executing. The calling thread will be blocked until the event on which it is waiting occurs. Once the blocking call is completed, the thread is ready to continue execution. A blocked thread will not prevent progress of other threads on the same PE and will not prevent them from executing other NVSHMEM calls when the thread level permits. In addition, a blocked thread will not prevent the progress of NVSHMEM calls performed on other PEs.
In the
NVSHMEM_THREAD_MULTIPLE
thread level, all NVSHMEM calls are thread-safe. That is, any two concurrently running threads may make NVSHMEM calls.In the
NVSHMEM_THREAD_SERIALIZED
andNVSHMEM_THREAD_MULTIPLE
thread levels, if multiple threads call collective routines, including the symmetric heap management routines, it is the programmer’s responsibility to ensure the correct ordering of collective calls.NVSHMEM currently supports the
NVSHMEM_THREAD_SERIALIZED
threading level for host and stream APIs andNVSHMEM_THREAD_MULTIPLE
for device APIs.
NVSHMEM_INIT_THREAD¶
- int nvshmem\_init\_thread(int requested, int *provided)
- requested [IN]
The thread level support requested by the user for host and stream APIs.
- provided [OUT]
The thread level support provided by the NVSHMEM implementation for host and stream APIs.
Description
nvshmem_init_thread
initializes the NVSHMEM library in the same way
as nvshmem_init
. In addition, nvshmem_init_thread
also performs
the initialization required for supporting the provided thread level.
The argument requested
is used to specify the desired level of
thread support. The argument provided
returns the support level
provided by the library. The allowed values for provided
and
requested
are NVSHMEM_THREAD_SINGLE
,
NVSHMEM_THREAD_FUNNELED
, NVSHMEM_THREAD_SERIALIZED
, and
NVSHMEM_THREAD_MULTIPLE
.
An NVSHMEM program is initialized either by nvshmem_init
or
nvshmem_init_thread
. Once an NVSHMEM library initialization call has
been performed, a subsequent initialization call in the same program
results in undefined behavior. If the call to nvshmem_init_thread
is
unsuccessful in allocating and initializing resources for the NVSHMEM
library, then the behavior of any subsequent call to the NVSHMEM library
is undefined.
Returns
nvshmem_init_thread
returns 0 upon success; otherwise, it returns a
nonzero value.
Notes
The NVSHMEM library can be initialized either by nvshmem_init
or
nvshmem_init_thread
. If the NVSHMEM library is initialized by
nvshmem_init
, the library implementation can choose to support any
one of the defined thread levels.
The NVSHMEM device APIs always support NVSHMEM_THREAD_MULTIPLE
.
NVSHMEM_QUERY_THREAD¶
- void nvshmem\_query\_thread(int *provided)
- provided [OUT]
The thread level support provided by the NVSHMEM implementation for host and stream APIs.
Description
The nvshmem_query_thread
call returns the level of thread support
currently being provided. The value returned will be same as was
returned in provided
by a call to nvshmem_init_thread
, if the
NVSHMEM library was initialized by nvshmem_init_thread
. If the
library was initialized by nvshmem_init
, the implementation can
choose to provide any one of the defined thread levels, and
nvshmem_query_thread
returns this thread level.
Returns
None.
Notes
The NVSHMEM device APIs always support NVSHMEM_THREAD_MULTIPLE
.