Python Bindings (NVSHMEM4Py)¶

NVSHMEM4Py is the official Python language binding for NVSHMEM, providing a Pythonic interface to the NVSHMEM library’s functionality. It enables Python applications to leverage NVSHMEM’s high-performance PGAS (Partitioned Global Address Space) programming model for GPU-accelerated computing.

Quick Start¶

To use NVSHMEM4Py in your Python application:

import nvshmem.core as nvshmem
from mpi4py import MPI
from cuda.core.experimental import Device

dev = Device()
dev.set_current()
stream = dev.create_stream()

# Initialize MPI
comm = MPI.COMM_WORLD

# Initialize NVSHMEM with MPI
nvshmem.init(dev, mpi_comm=comm, init_method="mpi")

# Get information about the current PE
my_pe = nvshmem.my_pe()
n_pes = nvshmem.n_pes()

# Allocate symmetric memory
# array() returns a CuPy NDArray object
x = nvshmem.array((1024,), dtype="float32")
y = nvshmem.array((1024,), dtype="float32")

if my_pe == 0:
    y[:] = 1.0

# Perform communication operations
# Put y from PE 0 into x on PE 1
if my_pe == 0:
    nvshmem.put(x, y, pe=1, stream=stream)

# Synchronize PEs
stream.sync()

# Clean up
nvshmem.free_array(x)
nvshmem.free_array(y)
nvshmem.finalize()

Key Features¶

Pythonic interface to NVSHMEM functionality
Seamless integration with NumPy, CuPy, and PyTorch
Support for symmetric memory allocation and management
Communication operations (put/get, collectives)
Synchronization primitives

For more detailed information, see the NVSHMEM4Py Overview and API reference sections.