nvmath-python Bindings#
Overview#
Warning
All Python bindings documented in this section are experimental and subject to future changes. Use it at your own risk.
Low-level Python bindings for C APIs from NVIDIA Math Libraries are exposed under the
corresponding modules in nvmath.. To access the Python bindings, use the
modules for the corresponding libraries. Under the hood, nvmath-python handles the run-time
linking to the libraries for you lazily.
The currently supported libraries along with the corresponding module names are listed as follows:
| Library name | Python access | 
|---|---|
| cuBLAS | |
| cuBLASLt | |
| cuFFT | |
| cuRAND | |
| cuSOLVER | |
| cuSOLVERDn | |
| cuSPARSE | 
Support for more libraries will be added in the future.
Naming & Calling Convention#
Inside each of the modules, all public APIs of the corresponding NVIDIA Math library are exposed following the PEP 8 style guide along with the following changes:
- All library name prefixes are stripped 
- The function names are broken by words and follow the camel case 
- The first letter in each word in the enum names are capitalized 
- Each enum’s name prefix is stripped from its values’ names 
- Whenever applicable, the outputs are stripped away from the function arguments and returned directly as Python objects 
- Pointers are passed as Python - int
- Exceptions are raised instead of returning the C error code 
Below is a non-exhaustive list of examples of such C-to-Python mappings:
- Function: - cublasDgemm->- cublas.dgemm().
- Function: - curandSetGeneratorOrdering->- curand.set_generator_ordering()
- Enum type: - cublasLtMatmulTile_t->- cublasLt.MatmulTile
- Enum type: - cufftXtSubFormat->- cufft.XtSubFormat
- Enum value name: - CUSOLVER_EIG_MODE_NOVECTOR->- cusolver.EigMode.NOVECTOR
- Enum value name: - CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED->- cusparse.Status.MATRIX_TYPE_NOT_SUPPORTED
- Returns: The outputs of - cusolverDnXpotrf_bufferSizeare the workspace sizes on device and host, which are wrapped as a 2-tuple in the corresponding- cusolverDn.xpotrf_buffer_size()Python API.
There may be exceptions for the above rules, but they would be self-evident and will be properly documented. In the next section we discuss pointer passing in Python.
Memory management#
Pointer and data lifetime#
Unlike in C/C++, Python does not provide low-level primitives to allocate/deallocate host memory (nor device memory). In order to make the C APIs work with Python, it is important that memory management is properly done through Python proxy objects. In nvmath-python, we ask users to address such needs using NumPy (for host memory) and CuPy (for device memory).
Note
It is also possible to use array.array (plus memoryview as needed) to
manage host memory. However it is more laborious compared to using
numpy.ndarray, especially when it comes to array manipulation and computation.
Note
It is also possible to use CUDA Python to manage device memory, but as of CUDA 11 there is no simple, pythonic way to modify the contents stored on GPU, which requires custom kernels. CuPy is a lightweight, NumPy-compatible array library that addresses this need.
To pass data from Python to C, using pointer addresses (as Python int) of various
objects is required. We illustrate this using NumPy/CuPy arrays as follows:
# create a host buffer to hold 5 int
buf = numpy.empty((5,), dtype=numpy.int32)
# pass buf's pointer to the wrapper
# buf could get modified in-place if the function writes to it
my_func(..., buf.ctypes.data, ...)
# examine/use buf's data
print(buf)
# create a device buffer to hold 10 double
buf = cupy.empty((10,), dtype=cupy.float64)
# pass buf's pointer to the wrapper
# buf could get modified in-place if the function writes to it
my_func(..., buf.data.ptr, ...)
# examine/use buf's data
print(buf)
# create an untyped device buffer of 128 bytes
buf = cupy.cuda.alloc(128)
# pass buf's pointer to the wrapper
# buf could get modified in-place if the function writes to it
my_func(..., buf.ptr, ...)
# buf is automatically destroyed when going out of scope
The underlying assumption is that the arrays must be contiguous in memory (unless the C interface allows for specifying the array strides).
As a consequence, all C structs in NVIDIA Math libraries (including handles and descriptors)
are not exposed as Python classes; that is, they do not have their own types and are
simply cast to plain Python int for passing around. Any downstream consumer should
create a wrapper class to hold the pointer address if so desired. In other words, users have
full control (and responsibility) for managing the pointer lifetime.
However, in certain cases we are able to convert Python objects for users (if readonly, host arrays are needed) so as to alleviate users’ burden. For example, in functions that require a sequence or a nested sequence, the following operations are equivalent:
# passing a host buffer of int type can be done like this
buf = numpy.array([0, 1, 3, 5, 6], dtype=numpy.int32)
my_func(..., buf.ctypes.data, ...)
# or just this
buf = [0, 1, 3, 5, 6]
my_func(..., buf, ...)  # the underlying data type is determined by the C API
which is particularly useful when users need to pass multiple sequences or nested sequences
to C (For example, nvmath.).
Note
Some functions require their arguments to be in the device memory. You need to pass
device memory (for example, cupy.ndarray) to such arguments. nvmath-python
neither validates the memory pointers nor implicitly transfers the data.
Passing host memory where device memory is expected (and vice versa) results in
undefined behavior.
API Reference#
This reference describes all nvmath-python’s math primitives.