Reshape#
- 
class nvmath.distributed. reshape. Reshape( 
- operand,
- /,
- input_box: Sequence[Sequence[int]],
- output_box: Sequence[Sequence[int]],
- *,
- options: ReshapeOptions | None = None,
- stream: AnyStream | None = None,
- Create a stateful object that encapsulates the specified distributed Reshape and required resources. This object ensures the validity of resources during use and releases them when they are no longer needed to prevent misuse. - This object encompasses all functionalities of the function-form API - reshape(), which is a convenience wrapper around it. The stateful object also allows for the amortization of preparatory costs when the same Reshape operation is to be performed on multiple operands with the same problem specification (see- reset_operand()for more details).- Using the stateful object typically involves the following steps: - Problem Specification: Initialize the object with a defined operation and options. 
- Preparation: Use - plan()to determine the best distributed algorithmic implementation for this specific Reshape operation.
- Execution: Perform the Reshape with - execute().
- Resource Management: Ensure all resources are released either by explicitly calling - free()or by managing the stateful object within a context manager.
 - Detailed information on each step described above can be obtained by passing in a - logging.Loggerobject to- ReshapeOptionsor by setting the appropriate options in the root logger object, which is used by default:- >>> import logging >>> logging.basicConfig( ... level=logging.INFO, ... format="%(asctime)s %(levelname)-8s %(message)s", ... datefmt="%m-%d %H:%M:%S", ... ) - Parameters:
- operand – - A tensor (ndarray-like object). The currently supported types are - numpy.ndarray,- cupy.ndarray, and- torch.Tensor.- Important - GPU operands must be on the symmetric heap (for example, allocated with - nvmath.).- distributed. - allocate_symmetric_memory() 
- input_box – The box specifying the distribution of the input operand across processes, where each process specifies which portion of the global array it holds. A box is a pair of coordinates specifying the lower and upper extent for each dimension. 
- output_box – The box specifying the distribution of the result across processes, where each process specifies which portion of the global array it will hold after reshaping. A box is a pair of coordinates specifying the lower and upper extent for each dimension. 
- options – Specify options for the Reshape as a - ReshapeOptionsobject. Alternatively, a- dictcontaining the parameters for the- ReshapeOptionsconstructor can also be provided. If not specified, the value will be set to the default-constructed- ReshapeOptionsobject.
- stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include - cudaStream_t(as Python- int),- cupy.cuda.Stream, and- torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.
 
 - See also - Examples - >>> import cupy as cp >>> import nvmath.distributed - Get MPI communicator used to initialize nvmath.distributed (for information on initializing nvmath.distributed, you can refer to the documentation or to the Reshape examples in nvmath/examples/distributed/reshape): - >>> comm = nvmath.distributed.get_context().communicator - Let’s create a 3D floating-point ndarray on GPU, distributed across a certain number of processes, with each holding a portion of the ndarray. As an example, process 0 holds a 3D box of the global 3D array of shape (4, 4, 4). - >>> shape = 4, 4, 4 - Reshape uses the NVSHMEM PGAS model, which requires GPU operands to be on the symmetric heap: - >>> if comm.Get_rank() == 0: ... a[:] = cp.random.rand(*shape) ... else: ... a = ... # each process holds a different section of the global array. ... a = nvmath.distributed.allocate_symmetric_memory(shape, cp) - With Reshape, we will change how the ndarray is distributed, by having each process specify the input and output section of the global array. For process 0, let’s assume that it holds the 3D box that goes from the lower corner given by coordinates (0, 0, 0) to the upper corner (4, 4, 4). - NOTE: each process has its own input and output boxes which are different to those of other processes, as each holds a different section of the global array. - >>> if comm.Get_rank() == 0: ... input_lower = (0, 0, 0) ... input_upper = (4, 4, 4) ... input_box = [input_lower, input_upper] ... output_box = ... ... else: ... input_box = ... # the input box depends on the process. ... output_box = ... # the output box depends on the process. - Create a Reshape object encapsulating the problem specification above: - >>> r = nvmath.distributed.reshape.Reshape(a, input_box, output_box) - Options can be provided above to control the behavior of the operation using the - optionsargument (see- ReshapeOptions).- Next, plan the Reshape: - >>> r.plan() - Now execute the Reshape, and obtain the result - bas a CuPy ndarray. Reshape always performs the distributed operation on the GPU.- >>> b = r.execute() - Finally, free the Reshape object’s resources. To avoid this explicit call, it’s recommended to use the Reshape object as a context manager as shown below, if possible. - >>> r.free() - Any symmetric memory that is owned by the user must be deleted explicitly (this is a collective call and must be called by all processes): - >>> nvmath.distributed.free_symmetric_memory(a, b) - Note that all - Reshapemethods execute on the current stream by default. Alternatively, the- streamargument can be used to run a method on a specified stream.- Let’s now look at the same problem with NumPy ndarrays on the CPU. - >>> import numpy as np >>> a = np.random.rand(*shape) # each process holds a different section - Create a Reshape object encapsulating the problem specification described earlier and use it as a context manager. - >>> with nvmath.distributed.reshape.Reshape(a, input_box, output_box) as r: ... r.plan() ... ... # Execute the Reshape to redistribute the ndarray. ... b = r.execute() - All the resources used by the object are released at the end of the block. - Reshape always executes on the GPU. In this case, because - aresides in host memory, the NumPy array is temporarily copied to device memory (on the symmetric memory heap), re-distributed on the GPU, and the result is copied to host memory as a NumPy array.- Further examples can be found in the nvmath/examples/distributed/reshape directory. - Methods - __init__(
- operand,
- /,
- input_box: Sequence[Sequence[int]],
- output_box: Sequence[Sequence[int]],
- *,
- options: ReshapeOptions | None = None,
- stream: AnyStream | None = None,
 - execute(
- stream: AnyStream | None = None,
- release_workspace: bool = False,
- sync_symmetric_memory: bool = True,
- Execute the Reshape operation. - Parameters:
- stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include - cudaStream_t(as Python- int),- cupy.cuda.Stream, and- torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.
- release_workspace – A value of - Truespecifies that the Reshape object should release workspace memory back to the symmetric memory pool on function return, while a value of- Falsespecifies that the object should retain the memory. This option may be set to- Trueif the application performs other operations that consume a lot of memory between successive calls to the (same or different)- execute()API, but incurs an overhead due to obtaining and releasing workspace memory from and to the symmetric memory pool on every call. The default is- False. NOTE: All processes must use the same value or the application can deadlock.
- sync_symmetric_memory – Indicates whether to issue a symmetric memory synchronization operation on the execute stream before the reshape operation. Note that before the Reshape starts executing, it is required that the source operand be ready on all processes. A symmetric memory synchronization ensures completion and visibility by all processes of previously issued local stores to symmetric memory. Advanced users who choose to manage the synchronization on their own using the appropriate NVSHMEM API, or who know that GPUs are already synchronized on the source operand, can set this to False. 
 
- Returns:
- The reshaped operand, which remains on the same device and utilizes the same package as the input operand. For GPU operands, the result will be in symmetric memory and the user is responsible for explicitly deallocating it (for example, using - nvmath.).- distributed. - free_symmetric_memory(tensor) 
 
 - free()[source]#
- Free Reshape resources. - It is recommended that the - Reshapeobject be used within a context, but if it is not possible then this method must be called explicitly to ensure that the Reshape resources (especially internal library objects) are properly cleaned up.
 - plan(
- stream: AnyStream | None = None,
- Plan the Reshape. - Parameters:
- stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include - cudaStream_t(as Python- int),- cupy.cuda.Stream, and- torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.
 
 - reset_operand(
- operand=None,
- *,
- stream: AnyStream | None = None,
- Reset the operand held by this - Reshapeinstance. This method has two use cases:- it can be used to provide a new operand for execution 
- it can be used to release the internal reference to the previous operand and potentially make its memory available for other use by passing - operand=None.
 - Parameters:
- operand – - A tensor (ndarray-like object) compatible with the previous one or - None(default). A value of- Nonewill release the internal reference to the previous operand and user is expected to set a new operand before again calling- execute(). The new operand is considered compatible if all the following properties match with the previous one:- The operand distribution, which must be (input_box, output_box) where input_box and output_box are the boxes specified at plan time. 
- The package that the new operand belongs to. 
- The dtype of the new operand. 
- The shape and strides of the new operand. 
- The memory space of the new operand (CPU or GPU). 
- The device that new operand belongs to if it is on GPU. 
 
- stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include - cudaStream_t(as Python- int),- cupy.cuda.Stream, and- torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used..
 
 - Examples - >>> import cupy as cp >>> import nvmath.distributed - Get MPI communicator used to initialize nvmath.distributed (for information on initializing nvmath.distributed, you can refer to the documentation or to the Reshape examples in nvmath/examples/distributed/reshape): - >>> comm = nvmath.distributed.get_context().communicator >>> nranks = comm.Get_size() - Create a 3-D complex128 ndarray on GPU symmetric memory, initially partitioned on the X axis (the global shape is (128, 128, 128)): - >>> shape = 128 // nranks, 128, 128 >>> dtype = cp.complex128 >>> a = nvmath.distributed.allocate_symmetric_memory(shape, cp, dtype=dtype) >>> a[:] = cp.random.rand(*shape) + 1j * cp.random.rand(*shape) - Compute the input and output box for the desired re-distribution: - >>> input_box = ... >>> output_box = ... - Create a Reshape object as a context manager - >>> with nvmath.distributed.reshape.Reshape(a, input_box, output_box) as f: ... # Plan the Reshape ... r.plan() ... ... # Execute the Reshape to get the first result. ... r1 = r.execute() ... ... # Reset the operand to a new CuPy ndarray. ... b = nvmath.distributed.allocate_symmetric_memory(shape, cp, dtype=dtype) ... b[:] = cp.random.rand(*shape) + 1j * cp.random.rand(*shape) ... f.reset_operand(b) ... ... # Execute to get the new result corresponding to the updated operand. ... r2 = f.execute() - With - reset_operand(), minimal overhead is achieved as problem specification and planning are only performed once.- For the particular example above, explicitly calling - reset_operand()is equivalent to updating the operand in-place, i.e, replacing- f.reset_operand(b)with- a[:]=b. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result and incur no additional copies under the additional constraints below:- The operand’s distribution is the same. 
 - For more details, please refer to inplace update example.