CUDA Wrapper

Module: polygraphy.cuda

class MemcpyKind[source]

Bases: object

Enumerates different kinds of copy operations.

HostToHost = c_int(0)

Copies from host memory to host memory

HostToDevice = c_int(1)

Copies from host memory to device memory

DeviceToHost = c_int(2)

Copies from device memory to host memory

DeviceToDevice = c_int(3)

Copies from device memory to device memory

class Cuda[source]

Bases: object

NOTE: Do not construct this class manually. Instead, use the wrapper() function to get the global wrapper.

Wrapper that exposes low-level CUDA functionality.

malloc(nbytes)[source]

Allocates memory on the GPU.

Parameters

nbytes (int) – The number of bytes to allocate.

Returns

The memory address of the allocated region, i.e. a device pointer.

Return type

int

Raises

PolygraphyException – If an error was encountered during the allocation.

free(ptr)[source]

Frees memory allocated on the GPU.

Parameters

ptr (int) – The memory address, i.e. a device pointer.

Raises

PolygraphyException – If an error was encountered during the free.

memcpy(dst, src, nbytes, kind, stream_ptr=None)[source]

Copies data between host and device memory.

Parameters
  • dst (int) – The memory address of the destination, i.e. a pointer.

  • src (int) – The memory address of the source, i.e. a pointer.

  • nbytes (int) – The number of bytes to copy.

  • kind (MemcpyKind) – The kind of copy to perform.

  • stream_ptr (int) – The memory address of a CUDA stream, i.e. a pointer. If this is not provided, a synchronous copy is performed.

Raises

PolygraphyException – If an error was encountered during the copy.

wrapper()[source]

Returns the global Polygraphy CUDA wrapper.

Returns

The global CUDA wrapper.

Return type

Cuda

class Stream[source]

Bases: object

High-level wrapper for a CUDA stream.

ptr

The memory address of the underlying CUDA stream

Type

int

__exit__(exc_type, exc_value, traceback)[source]

Frees the underlying CUDA stream.

free()[source]

Frees the underlying CUDA stream.

You can also use a context manager to manage the stream lifetime. For example:

with Stream() as stream:
    ...
synchronize()[source]

Synchronizes the stream.

class DeviceView(ptr, shape, dtype)[source]

Bases: object

A read-only view of a GPU memory region.

Parameters
  • ptr (int) – A pointer to the region of memory.

  • shape (Tuple[int]) – The shape of the region.

  • dtype (DataType) – The data type of the region.

ptr

The memory address of the underlying GPU memory

Type

int

shape

The shape of the device buffer

Type

Tuple[int]

property dtype

The data type of the device buffer

Type

DataType

property nbytes

The number of bytes in the memory region.

copy_to(host_buffer, stream=None)[source]

Copies from this device buffer to the provided host buffer.

Parameters
  • host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy into. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and large enough to accomodate the device buffer.

  • stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns

The host buffer

Return type

np.ndarray

numpy()[source]

Create a new NumPy array containing the contents of this device buffer.

Returns

The newly created NumPy array.

Return type

np.ndarray

class DeviceArray(shape=None, dtype=None)[source]

Bases: polygraphy.cuda.cuda.DeviceView

An array on the GPU.

Parameters
  • shape (Tuple[int]) – The initial shape of the buffer.

  • dtype (DataType) – The data type of the buffer.

copy_to(host_buffer, stream=None)

Copies from this device buffer to the provided host buffer.

Parameters
  • host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy into. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and large enough to accomodate the device buffer.

  • stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns

The host buffer

Return type

np.ndarray

property nbytes

The number of bytes in the memory region.

numpy()

Create a new NumPy array containing the contents of this device buffer.

Returns

The newly created NumPy array.

Return type

np.ndarray

ptr

The memory address of the underlying GPU memory

Type

int

shape

The shape of the device buffer

Type

Tuple[int]

static raw(shape=None)[source]

Creates an untyped device array of the specified shape.

Parameters

shape (Tuple[int]) – The initial shape of the buffer, in units of bytes. For example, a shape of (4, 4) would allocate a 16 byte array.

Returns

The raw device array.

Return type

DeviceArray

resize(shape)[source]

Resizes or reshapes the array to the specified shape.

If the allocated memory region is already large enough, no reallocation is performed.

Parameters

shape (Tuple[int]) – The new shape.

Returns

self

Return type

DeviceArray

__exit__(exc_type, exc_value, traceback)[source]

Frees the underlying memory of this DeviceArray.

free()[source]

Frees the GPU memory associated with this array.

You can also use a context manager to ensure that memory is freed. For example:

with DeviceArray(...) as arr:
    ...
copy_from(host_buffer, stream=None)[source]

Copies from the provided host buffer into this device buffer.

Parameters
  • host_buffer (Union[numpy.ndarray, torch.Tensor]) – The host buffer to copy from. The buffer must be contiguous in memory (see np.ascontiguousarray or torch.Tensor.contiguous) and not larger than this device buffer.

  • stream (Stream) – A Stream instance. Performs a synchronous copy if no stream is provided.

Returns

self

Return type

DeviceArray

view(shape=None, dtype=None)[source]

Creates a read-only DeviceView from this DeviceArray.

Parameters
  • shape (Sequence[int]) – The desired shape of the view. Defaults to the shape of this array or view.

  • dtype (DataType) – The desired data type of the view. Defaults to the data type of this array or view.

Returns

A view of this arrays data on the device.

Return type

DeviceView