Initialization and Finalization¶
This section documents the initialization and finalization APIs in nvshmem.core.init_fini
.
The nvshmem4py.core module provides initialization and finalization routines for the NVSHMEM runtime. These must be called before and after using any NVSHMEM features in Python.
Examples¶
MPI-based initialization:
from mpi4py import MPI
from cuda.core.experimental import Device
import nvshmem.core as nvshmem
rank = MPI.COMM_WORLD.Get_rank()
dev = Device(rank % system.num_devices)
dev.set_current()
nvshmem.init(device=dev, mpi_comm=MPI.COMM_WORLD, initializer_method="mpi")
# ... use NVSHMEM ...
nvshmem.finalize()
UID-based initialization:
from mpi4py import MPI
from cuda.core.experimental import Device, system
import nvshmem.core as nvshmem
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nranks = comm.Get_size()
dev = Device(rank % system.num_devices)
dev.set_current()
uid = nvshmem.get_unique_id(empty=(rank != 0))
comm.Bcast(uid._data.view(np.int8), root=0)
nvshmem.init(device=dev, uid=uid, rank=rank, nranks=nranks, initializer_method="uid")
# ... use NVSHMEM ...
nvshmem.finalize()
Emulated MPI initialization:
from mpi4py import MPI
from cuda.core.experimental import Device
import nvshmem.core as nvshmem
rank = MPI.COMM_WORLD.Get_rank()
dev = Device(rank % system.num_devices)
dev.set_current()
nvshmem.init(device=dev, mpi_comm=MPI.COMM_WORLD, initializer_method="emulated_mpi")
# ... use NVSHMEM ...
nvshmem.finalize()
API Reference¶
Teams¶
-
class
nvshmem.core.
Teams
(IntEnum)¶ NVSHMEM4Py uses an enumerator to refer to NVSHMEM Teams.
-
TEAM_WORLD
¶ The world team that contains all PEs in the NVSHMEM program.
-
TEAM_SHARED
¶ The team of PEs that share a memory domain.
NVSHMEM_TEAM_SHARED
refers to the team of all PEs that would mutually return a non-null address from a call to nvshmem_ptr for all symmetric heap objects. That is, nvshmem_ptr must return a non-null pointer to the local PE for all symmetric heap objects on all target PEs in the team. This means that symmetric heap objects on each PE are directly load/store accessible by all PEs in the team. See team-management for more detail about its use.
-
TEAM_NODE
¶ The team of PEs that are on the same node
-
TEAM_SAME_MYPE_NODE
¶ The team of PEs that are the same PE within a node - that is to say, all PEs for which
nvshmem.core.team_my_pe(Teams.TEAM_NODE)
returns the same value.
-
TEAM_SAME_GPU
¶ The team of PEs that are on the same GPU
-
TEAM_GPU_LEADERS
¶ The team of PEs that are leaders of their respective GPUs
-
Initialization Methods¶
NVSHMEM supports multiple bootstrap methods to initialize the runtime. You must explicitly specify one of the following using the initializer_method argument in nvshmem.core.init()
.
Supported methods:
"mpi"
: Initializes NVSHMEM using an MPI communicator (mpi4py
is required)."uid"
: Initializes NVSHMEM using a user-provided unique identifier and rank information."emulated_mpi"
: Uses MPI to broadcast a unique ID internally before doing UID-based init.
-
nvshmem.core.
init
(device: cuda.core.experimental._device.Device = None, uid: nvshmem.bindings.nvshmem.uniqueid = None, rank: int = None, nranks: int = None, mpi_comm: None = None, initializer_method: str = '') → None¶ Initialize the NVSHMEM runtime with either MPI or UID-based bootstrapping.
- Args:
- device (
cuda.core.Device
, required): A Device() that will be bound to this process. All NVSHMEM operations on this process will use this Device - uid (
nvshmem.UniqueID
, optional): A unique identifier used for UID-based initialization. - Must be provided if initializer_method is “uid”.
- uid (
- rank (int, optional): Rank of the calling process in the NVSHMEM job. Required for UID-based init.
- nranks (int, optional): Total number of NVSHMEM ranks in the job. Required for UID-based init.
- mpi_comm (
mpi4py.MPI.Comm
, optional): MPI communicator to use for MPI-based initialization. - Defaults to
MPI.COMM_WORLD
ifNone
andinitializer_method
is “mpi”.
- mpi_comm (
- initializer_method (str): Specifies the initialization method. Must be either “mpi” or “uid”.
- device (
- Raises:
- NvshmemInvalid: If an invalid initialization method is provided, or required arguments
- for the selected method are missing or incorrect.
- NvshmemError: If NVSHMEM fails to initialize using the specified method.
- Notes:
- If using MPI-based init, ensure
mpi4py
is compiled against the same MPI distribution you’re running with. A mismatch can result in undefined behavior. - For UID-based init, the user is responsible for distributing the uid to all
processes and passing in the correct
rank
andnranks
. - UID-based init is useful for bootstrapping over non-MPI runtimes or custom transports.
- Internally, this sets up a
bindings.InitAttr()
structure which is passed to the NVSHMEM host library.
- If using MPI-based init, ensure
- Example:
>>> from mpi4py import MPI >>> import nvshmem.core as nvshmem >>> nvshmem.init(mpi_comm=MPI.COMM_WORLD, initializer_method="mpi") # OR for UID mode >>> uid = nvshmem.get_unique_id() >>> nvshmem.init(uid=uid, rank=0, nranks=1, initializer_method="uid")
Querying Initialization Status¶
-
class
nvshmem.core.
InitStatus
(IntEnum)¶ NVSHMEM4Py enumerator for initialization status.
-
STATUS_NOT_INITIALIZED
¶ The program is not initialized.
-
STATUS_IS_BOOTSTRAPPED
¶ The group of PEs is bootstrapped, but NVSHMEM is not initialized. This means processes can communicate with each other, but CUDA devices are not yet bound to a specific PE. After calling
nvshmem.core.finalize()
(after a successful initialization), the program will be in this state.
-
STATUS_IS_INITIALIZED
¶ The NVSHMEM runtime is initialized. After a succesful call to
nvshmem.core.init()
, the program will be in this state.
-
STATUS_LIMITED_MPG
¶ The NVSHMEM runtime is initialized with limited MPG support. See https://docs.nvidia.com/nvshmem/api/using.html?highlight=mpg for more details.
-
STATUS_FULL_MPG
¶ The NVSHMEM runtime is initialized with full MPG support. See https://docs.nvidia.com/nvshmem/api/using.html?highlight=mpg for more details.
-
STATUS_INVALID
¶ The program has an invalid state. This is typically due to an error in the initialization process.
-
Finalization¶
When NVSHMEM operations are complete, call nvshmem.core.finalize()
to clean up runtime resources.
-
nvshmem.core.
finalize
() → None¶ Finalize the NVSHMEM runtime.
This function wraps the NVSHMEM finalization routine. It should be called after all NVSHMEM operations are complete and before the application exits.
Typically, this is called once per process to clean up NVSHMEM resources.
- Raises:
NvshmemError
: If the NVSHMEM finalization fails.- Example:
>>> nvshmem.core.finalize()
Retrieving Version Information¶
You can query NVSHMEM version details using:
-
nvshmem.core.
get_version
() → nvshmem.core.nvshmem_types.Version¶ Get the NVSHMEM4Py version
Returns an object of type nvshmem.core.Version which is a Python class This class contains several strings which represent versions related to NVSHMEM
Version.openshmem_spec_version
is the OpenSHMEM Spec that this NVSHMEM was built againstVersion.nvshmem4py_version
is the version of the NVSHMEM4Py python libraryVersion.libnvshmem_version
is the version of NVSHMEM library that this package has opened
Retrieving a Unique ID¶
For UID-based initialization, use:
-
nvshmem.core.
get_unique_id
(empty=False) → nvshmem.bindings.nvshmem.uniqueid¶ Retrieve or create a unique ID used for UID-based NVSHMEM initialization.
This function wraps the underlying NVSHMEM binding for obtaining a unique ID required when using the
uid
initializer method. Only a single rank (typically rank 0) should call this with empty=False to generate the ID. Other ranks should call it withempty=True
and receive the ID through a user-defined communication mechanism (e.g., MPI broadcast or socket transfer).- Args:
- empty (bool): If True, returns an empty (uninitialized) unique ID structure.
- If False, calls the underlying NVSHMEM function to generate a valid unique ID.
- Returns:
UniqueID
: A UniqueID object containing the generated or empty NVSHMEM unique ID.- Raises:
NvshmemError
: If retrieving the unique ID from NVSHMEM fails.- Example:
>>> if rank == 0: ... uid = nvshmem.core.get_unique_id() ... else: ... uid = nvshmem.core.get_unique_id(empty=True) ... >>> nvshmem.core.init(uid=uid, rank=rank, nranks=size, initializer_method="uid")