Distributed Reshape#
Overview#
The distributed reshape module nvmath. in
nvmath-python leverages the NVIDIA cuFFTMp library and provides APIs that can
be directly called from the host to efficiently redistribute local operands
on multiple processes on multi-node multi-GPU systems at scale. Both stateless
function-form APIs and stateful class-form APIs are provided:
- function-form reshape using - nvmath..- distributed. - reshape. - reshape() 
- stateful reshape using - nvmath..- distributed. - reshape. - Reshape 
Reshape is a general-purpose API to change how data is distributed or partitioned across processes, by shuffling data among the processes. Distributed reshape supports arbitrary data distributions in the form of 1D/2D/3D boxes.
Box distribution#
Consider a X*Y*Z global array. 3D boxes can be used to describe a subsection
of this global array by indicating the lower and upper corner of the subsection.
By associating boxes to processes one can then describe a data distribution where
every process owns a contiguous rectangular subsection of the global array.
For instance, consider a 2D case with a global array of size X*Y = 4*4 and
three boxes, described as box = [lower, upper]:
box_0 = [(0,0), (2,2)]  # green
box_1 = [(2,0), (4,2)]  # blue
box_2 = [(0,2), (4,4)]  # purple
 
By associating box 0 to process 0, box 1 to process 1 and box 2 to process 2, this creates a
data distribution of the global 4*4 array across three processes. The same can be
generalized to N-D arrays and any number of processes.
For more information, refer to the cuFFTMp documentation.
Example#
To perform a distributed reshape, each process specifies its own input and output box, which determines the distribution of the input and output, respectively.
As an example, consider a matrix that is distributed column-wise on two processes (each process owns a contiguous chunk of columns). To redistribute the matrix row-wise, we can use distributed reshape:
Tip
Reminder to initialize the distributed context first as per Initializing the distributed runtime.
# The global dimensions of the matrix are 4x4. The matrix is distributed
# column-wise, so each process has 4 rows and 2 columns.
# Get process rank from mpi4py communicator.
rank = communicator.Get_rank()
# Initialize the matrix on each process, as a NumPy ndarray (on the CPU).
A = np.zeros((4, 2)) if rank == 0 else np.ones((4, 2))
# Reshape from column-wise to row-wise.
if rank == 0:
    input_box = [(0, 0), (4, 2)]
    output_box = [(0, 0), (2, 4)]
else:
    input_box = [(0, 2), (4, 4)]
    output_box = [(2, 0), (4, 4)]
# Distributed reshape returns a new operand with its own buffer.
B = nvmath.distributed.reshape.reshape(A, input_box, output_box)
# The result is a NumPy ndarray, distributed row-wise:
# [0] B:
# [[0. 0. 1. 1.]
#  [0. 0. 1. 1.]]
#
# [1] B:
# [[0. 0. 1. 1.]
#  [0. 0. 1. 1.]]
print(f"[{rank}] B:\n{B}")
API Reference#
Reshape support (nvmath.)#
| 
 | Perform a distributed reshape on the provided operand to change its distribution across processes. | 
| 
 | Create a stateful object that encapsulates the specified distributed Reshape and required resources. | 
| 
 | A data class for providing options to the  |