CUDA Wrapper

Module: polygraphy.cuda

class MemcpyKind[source]

Bases: object

Enumerates different kinds of copy operations.

HostToHost = c_int(0): Copies from host memory to host memory

HostToDevice = c_int(1): Copies from host memory to device memory

DeviceToHost = c_int(2): Copies from device memory to host memory

DeviceToDevice = c_int(3): Copies from device memory to device memory

class Cuda[source]

Bases: object

NOTE: Do not construct this class manually. Instead, use the wrapper() function to get the global wrapper.

Wrapper that exposes low-level CUDA functionality.

malloc(nbytes)[source]

Allocates memory on the GPU.

Parameters:: nbytes (int) – The number of bytes to allocate.
Returns:: The memory address of the allocated region, i.e. a device pointer.
Return type:: int
Raises:: PolygraphyException – If an error was encountered during the allocation.

free(ptr)[source]

Frees memory allocated on the GPU.

Parameters:: ptr (int) – The memory address, i.e. a device pointer.
Raises:: PolygraphyException – If an error was encountered during the free.

memcpy(dst, src, nbytes, kind, stream_ptr=None)[source]

Copies data between host and device memory.

Parameters:

dst (int) – The memory address of the destination, i.e. a pointer.
src (int) – The memory address of the source, i.e. a pointer.
nbytes (int) – The number of bytes to copy.
kind (MemcpyKind) – The kind of copy to perform.
stream_ptr (int) – The memory address of a CUDA stream, i.e. a pointer. If this is not provided, a synchronous copy is performed.

Raises:

PolygraphyException – If an error was encountered during the copy.

wrapper()[source]

Returns the global Polygraphy CUDA wrapper.

Returns:: The global CUDA wrapper.
Return type:: Cuda

class Stream[source]

Bases: object

High-level wrapper for a CUDA stream.

ptr

The memory address of the underlying CUDA stream

Type:: int

__exit__(exc_type, exc_value, traceback)[source]: Frees the underlying CUDA stream.

free()[source]

Frees the underlying CUDA stream.

You can also use a context manager to manage the stream lifetime. For example:

with Stream() as stream:
    ...

synchronize()[source]: Synchronizes the stream.

class DeviceView(ptr, shape, dtype)[source]

Bases: object

A read-only view of a GPU memory region.

Parameters:

ptr (int) – A pointer to the region of memory.
shape (Tuple[int]) – The shape of the region.
dtype (DataType) – The data type of the region.

ptr

The memory address of the underlying GPU memory

Type:: int

shape

The shape of the device buffer

Type:: Tuple[int]

property dtype

The data type of the device buffer

Type:: DataType

property nbytes: The number of bytes in the memory region.

copy_to(host_buffer, stream=None)[source]

Copies from this device buffer to the provided host buffer.

Parameters:

host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy into. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and large enough to accomodate the device buffer.
stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns:

The host buffer

Return type:

np.ndarray

numpy()[source]

Create a new NumPy array containing the contents of this device buffer.

Returns:: The newly created NumPy array.
Return type:: np.ndarray

class DeviceArray(shape=None, dtype=None)[source]

Bases: DeviceView

An array on the GPU.

Parameters:

shape (Tuple[int]) – The initial shape of the buffer.
dtype (DataType) – The data type of the buffer.

copy_to(host_buffer, stream=None)

Copies from this device buffer to the provided host buffer.

Parameters:

host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy into. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and large enough to accomodate the device buffer.
stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns:

The host buffer

Return type:

np.ndarray

property dtype

property nbytes: The number of bytes in the memory region.

numpy()

Create a new NumPy array containing the contents of this device buffer.

Returns:: The newly created NumPy array.
Return type:: np.ndarray

ptr

The memory address of the underlying GPU memory

Type:: int

shape

The shape of the device buffer

Type:: Tuple[int]

static raw(shape=None)[source]

Creates an untyped device array of the specified shape.

Parameters:: shape (Tuple[int]) – The initial shape of the buffer, in units of bytes. For example, a shape of (4, 4) would allocate a 16 byte array.
Returns:: The raw device array.
Return type:: DeviceArray

resize(shape)[source]

Resizes or reshapes the array to the specified shape.

If the allocated memory region is already large enough, no reallocation is performed.

Parameters:: shape (Tuple[int]) – The new shape.
Returns:: self
Return type:: DeviceArray

__exit__(exc_type, exc_value, traceback)[source]: Frees the underlying memory of this DeviceArray.

free()[source]

Frees the GPU memory associated with this array.

You can also use a context manager to ensure that memory is freed. For example:

with DeviceArray(...) as arr:
    ...

copy_from(host_buffer, stream=None)[source]

Copies from the provided host buffer into this device buffer.

Parameters:

host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy from. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and not larger than this device buffer.
stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns:

self

Return type:

DeviceArray

view(shape=None, dtype=None)[source]

Creates a read-only DeviceView from this DeviceArray.

Parameters:

shape (Sequence[int]) – The desired shape of the view. Defaults to the shape of this array or view.
dtype (DataType) – The desired data type of the view. Defaults to the data type of this array or view.

Returns:

A view of this arrays data on the device.

Return type:

DeviceView