CUDA Wrapper

Module: polygraphy.cuda

class MemcpyKind[source]

Bases: object

Enumerates different kinds of copy operations.

HostToHost = c_int(0)

Copies from host memory to host memory

HostToDevice = c_int(1)

Copies from host memory to device memory

DeviceToHost = c_int(2)

Copies from device memory to host memory

DeviceToDevice = c_int(3)

Copies from device memory to device memory

class Cuda[source]

Bases: object

NOTE: Do not construct this class manually. Instead, use the wrapper() function to get the global wrapper.

Wrapper that exposes low-level CUDA functionality.

malloc(nbytes)[source]

Allocates memory on the GPU.

Parameters:

nbytes (int) – The number of bytes to allocate.

Returns:

The memory address of the allocated region, i.e. a device pointer.

Return type:

int

Raises:

PolygraphyException – If an error was encountered during the allocation.

free(ptr)[source]

Frees memory allocated on the GPU.

Parameters:

ptr (int) – The memory address, i.e. a device pointer.

Raises:

PolygraphyException – If an error was encountered during the free.

memcpy(dst, src, nbytes, kind, stream_ptr=None)[source]

Copies data between host and device memory.

Parameters:
  • dst (int) – The memory address of the destination, i.e. a pointer.

  • src (int) – The memory address of the source, i.e. a pointer.

  • nbytes (int) – The number of bytes to copy.

  • kind (MemcpyKind) – The kind of copy to perform.

  • stream_ptr (int) – The memory address of a CUDA stream, i.e. a pointer. If this is not provided, a synchronous copy is performed.

Raises:

PolygraphyException – If an error was encountered during the copy.

wrapper()[source]

Returns the global Polygraphy CUDA wrapper.

Returns:

The global CUDA wrapper.

Return type:

Cuda

class Stream[source]

Bases: object

High-level wrapper for a CUDA stream.

ptr

The memory address of the underlying CUDA stream

Type:

int

__exit__(exc_type, exc_value, traceback)[source]

Frees the underlying CUDA stream.

free()[source]

Frees the underlying CUDA stream.

You can also use a context manager to manage the stream lifetime. For example:

with Stream() as stream:
    ...
synchronize()[source]

Synchronizes the stream.

class DeviceView(ptr, shape, dtype)[source]

Bases: object

A read-only view of a GPU memory region.

Parameters:
  • ptr (int) – A pointer to the region of memory.

  • shape (Tuple[int]) – The shape of the region.

  • dtype (DataType) – The data type of the region.

ptr

The memory address of the underlying GPU memory

Type:

int

shape

The shape of the device buffer

Type:

Tuple[int]

property dtype

The data type of the device buffer

Type:

DataType

property nbytes

The number of bytes in the memory region.

copy_to(host_buffer, stream=None)[source]

Copies from this device buffer to the provided host buffer.

Parameters:
  • host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy into. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and large enough to accomodate the device buffer.

  • stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns:

The host buffer

Return type:

np.ndarray

numpy()[source]

Create a new NumPy array containing the contents of this device buffer.

Returns:

The newly created NumPy array.

Return type:

np.ndarray

class DeviceArray(shape=None, dtype=None)[source]

Bases: DeviceView

An array on the GPU.

Parameters:
  • shape (Tuple[int]) – The initial shape of the buffer.

  • dtype (DataType) – The data type of the buffer.

copy_to(host_buffer, stream=None)

Copies from this device buffer to the provided host buffer.

Parameters:
  • host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy into. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and large enough to accomodate the device buffer.

  • stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns:

The host buffer

Return type:

np.ndarray

property dtype
property nbytes

The number of bytes in the memory region.

numpy()

Create a new NumPy array containing the contents of this device buffer.

Returns:

The newly created NumPy array.

Return type:

np.ndarray

ptr

The memory address of the underlying GPU memory

Type:

int

shape

The shape of the device buffer

Type:

Tuple[int]

static raw(shape=None)[source]

Creates an untyped device array of the specified shape.

Parameters:

shape (Tuple[int]) – The initial shape of the buffer, in units of bytes. For example, a shape of (4, 4) would allocate a 16 byte array.

Returns:

The raw device array.

Return type:

DeviceArray

resize(shape)[source]

Resizes or reshapes the array to the specified shape.

If the allocated memory region is already large enough, no reallocation is performed.

Parameters:

shape (Tuple[int]) – The new shape.

Returns:

self

Return type:

DeviceArray

__exit__(exc_type, exc_value, traceback)[source]

Frees the underlying memory of this DeviceArray.

free()[source]

Frees the GPU memory associated with this array.

You can also use a context manager to ensure that memory is freed. For example:

with DeviceArray(...) as arr:
    ...
copy_from(host_buffer, stream=None)[source]

Copies from the provided host buffer into this device buffer.

Parameters:
  • host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy from. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and not larger than this device buffer.

  • stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns:

self

Return type:

DeviceArray

view(shape=None, dtype=None)[source]

Creates a read-only DeviceView from this DeviceArray.

Parameters:
  • shape (Sequence[int]) – The desired shape of the view. Defaults to the shape of this array or view.

  • dtype (DataType) – The desired data type of the view. Defaults to the data type of this array or view.

Returns:

A view of this arrays data on the device.

Return type:

DeviceView