Python API#

This is the Python API reference for the NVIDIA® nvCOMP library.

Data Type Association#

nvCOMP type

Python array-protocol type string

Type description

NVCOMP_TYPE_BITS

|b1

Bit

NVCOMP_TYPE_CHAR

|i1

8-bit signed character

NVCOMP_TYPE_UCHAR

|u1

8-bit unsigned character

NVCOMP_TYPE_UINT8

|u1

Byte

NVCOMP_TYPE_SHORT

<i2

Little-endian 2-byte signed integer

NVCOMP_TYPE_USHORT

<u2

Little-endian 2-byte unsigned integer

NVCOMP_TYPE_INT

<i4

Little-endian 4-byte signed integer

NVCOMP_TYPE_UINT

<u4

Little-endian 4-byte unsigned integer

NVCOMP_TYPE_LONGLONG

<i8

Little-endian 8-byte signed integer

NVCOMP_TYPE_ULONGLONG

<u8

Little-endian 8-byte unsigned integer

NVCOMP_TYPE_FLOAT16

<f2

Little-endian 2-byte float

BitstreamKind#

class nvidia.nvcomp.BitstreamKind#

Defines how buffer will be compressed in nvcomp

Members:

NVCOMP_NATIVE : Each input buffer is chunked according to manager setting and compressed in parallel. Allows computation of checksums. Adds custom header with nvCOMP metadata at the beginning of the compressed data.

RAW : Compresses input data as is, just using underlying compression algorithm. Does not add header with nvCOMP metadata.

WITH_UNCOMPRESSED_SIZE : Similar to RAW, but adds custom header with just uncompressed size at the beginning of the compressed data.

CudaStream#

class nvidia.nvcomp.CudaStream#

Wrapper around a CUDA stream. Provides either shared-ownership or view semantics, depending on whether it was constructed through borrow or make_new, respectively.

CudaStream is the type of stream parameters passed to allocation functions that can be used with set_*_allocator. If the deallocation of such memory needs to access the stream passed to the allocation function, the allocation function should return an ExternalMemory instance wrapping the newly constructed memory object and the CudaStream argument. The memory object should, from then on, only be accessed through the ExternalMemory wrapper. This ensures that the stream is still alive when the memory is deallocated.

It is not envisioned that CudaStream will be used outside allocation functions. Nevertheless, borrow and make_new are provided for completeness.

static borrow(
cuda_stream: int,
device_idx: int = -1,
) nvidia.nvcomp.nvcomp_impl.CudaStream#

Create a stream view.

The device index is primarily intended for special CUDA streams (i.e., the default, legacy, and per-thread streams) whose device cannot be inferred from the stream value itself. By default, it is equal to -1, a special value whose meaning depends on whether stream is special or not. If stream is special, the default value associates the shared stream with the current device. Otherwise, the CudaStream will always be associated with the stream’s actual device. In this case, passing a device_idx that is neither the default value nor the stream’s actual device will raise an exception.

Parameters:
  • cuda_stream – The cudaStream_t to wrap, represented as a Python integer.

  • device_idx – Optional index of the device with which to associate the borrowed stream. See function description for details. Equal to -1 by default.

property device#

The device index associated with the stream.

property is_special#

Whether the underlying stream is one of the special streams (default, legacy, or per-thread).

Note that passing a special stream to any CUDA API call will actually pass the current device’s corresponding special stream. It must therefore be ensured that the stream’s associated device, as given by device, is selected before using the stream. This is currently entirely the user’s responsibility.

static make_new(
device_idx: int = -1,
) nvidia.nvcomp.nvcomp_impl.CudaStream#

Create a new stream with shared ownership.

Parameters:

device_idx – Optional index of the device with which to associate the newly created stream. By default equal to -1, a special value that represents the current device.

property ptr#

The underlying cudaStream_t represented as a Python integer.

The property name follows the convention of cupy.Stream and reflects the fact that a cudaStream_t is internally a pointer.

Codec#

class nvidia.nvcomp.Codec#
__init__(
self: nvidia.nvcomp.nvcomp_impl.Codec,
**kwargs,
) None#

Initialize codec.

Parameters:
  • algorithm – An optional name of compression algorithm to use. By default it is empty and algorithm can be deducted during decoding.

  • device_id – An optional device id to execute decoding/encoding on. If not specified default device will be used.

  • cuda_stream – An optional cudaStream_t represented as a Python integer. By default internal cuda stream is created for given device id.

  • uncomp_chunk_size – An optional uncompressed data chunk size. By default it is 65536.

  • checksum_policy

    Defines strategy for computing and verification of checksum. By default NO_COMPUTE_NO_VERIFY is assumed.

    LZ4 algorithm specific options:

    data_type: An optional array-protocol type string for default data type to use.

    GDeflate algorithm specific options:
    algorithm_type: Compression algorithm type to use. Permitted values are:
    • 0 : highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)

    • 1 : high-throughput, low compression ratio (default)

    • 2 : medium-throughput, medium compression ratio, beat Zlib level 1 on the compression ratio

    • 3 : placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point

    • 4 : lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio

    • 5 : lowest-throughput, highest compression ratio

    Deflate algorithm specific options:
    algorithm_type: Compression algorithm type to use. Permitted values are:
    • 0 : highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)

    • 1 : high-throughput, low compression ratio (default)

    • 2 : medium-throughput, medium compression ratio, beat Zlib level 1 on the compression ratio

    • 3 : placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point

    • 4 : lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio

    • 5 : lowest-throughput, highest compression ratio

    Bitcomp algorithm specific options:
    algorithm_type: The type of Bitcomp algorithm used.
    • 0 : Default algorithm, usually gives the best compression ratios

    • 1 : “Sparse” algorithm, works well on sparse data (with lots of zeroes). and is usually a faster than the default algorithm.

    data_type: An optional array-protocol type string for default data type to use.

    ANS algorithm specific options:
    data_type: An optional array-protocol type string for default data type to use. Permitted values are:
    • |u1 : For unsigned 8-bit integer

    • <f2 : For 16-bit little-endian float. Requires uncomp_chunk_size to be multiple of 2

    Cascaded algorithm specific options:

    data_type: An optional array-protocol type string for default data type to use.

    num_rles: The number of Run Length Encodings to perform. By default equal to 2

    num_deltas: The number of Delta Encodings to perform. By default equal to 1

    use_bitpack: Whether or not to bitpack the final layers. By default it is True.

decode(*args, **kwargs)#

Overloaded function.

  1. decode(self: nvidia.nvcomp.nvcomp_impl.Codec, src: nvidia.nvcomp.nvcomp_impl.Array, data_type: str = ‘’) -> object

    Executes decoding of data from a Array handle.

    Args:

    src: Decode source object.

    data_type: An optional array-protocol type string for output data type. By default it is equal to |u1.

    Returns:

    nvcomp.Array

  2. decode(self: nvidia.nvcomp.nvcomp_impl.Codec, srcs: list[nvidia.nvcomp.nvcomp_impl.Array], data_type: str = ‘’) -> list[object]

    Executes decoding from a batch of Array handles.

    Args:

    srcs: List of Array objects

    data_type: An optional array-protocol type string for output data type.

    Returns:

    List of decoded nvcomp.Array’s

encode(*args, **kwargs)#

Overloaded function.

  1. encode(self: nvidia.nvcomp.nvcomp_impl.Codec, array_s: nvidia.nvcomp.nvcomp_impl.Array) -> object

    Encode array.

    Args:

    array: Array to encode

    Returns:

    Encoded nvcomp.Array

  2. encode(self: nvidia.nvcomp.nvcomp_impl.Codec, srcs: list[nvidia.nvcomp.nvcomp_impl.Array]) -> list[object]

    Executes encoding from a batch of Array handles.

    Args:

    srcs: List of Array objects

    Returns:

    List of encoded nvcomp.Array’s

ArrayBufferKind#

class nvidia.nvcomp.ArrayBufferKind#

Defines buffer kind in which array data is stored.

Members:

STRIDED_DEVICE : GPU-accessible in pitch-linear layout.

STRIDED_HOST : Host-accessible in pitch-linear layout.

Array#

class nvidia.nvcomp.Array#

Class which wraps array. It can be decoded data or data to encode.

property __cuda_array_interface__#

The CUDA array interchange interface compatible with Numba v0.39.0 or later (see CUDA Array Interface for details)

__dlpack__(
self: nvidia.nvcomp.nvcomp_impl.Array,
stream: object = None,
) capsule#

Export the array as a DLPack tensor

__dlpack_device__(
self: nvidia.nvcomp.nvcomp_impl.Array,
) tuple#

Get the device associated with the buffer

property buffer_kind#

Buffer kind in which array data is stored.

property buffer_size#

The total number of bytes to store the array.

cpu(self: nvidia.nvcomp.nvcomp_impl.Array) object#

Returns a copy of this array in CPU memory. If this array is already in CPU memory, than no copy is performed and the original object is returned.

Returns:

Array object with content in CPU memory or None if copy could not be done.

cuda(
self: nvidia.nvcomp.nvcomp_impl.Array,
synchronize: bool = True,
cuda_stream: int = 0,
) object#

Returns a copy of this array in device memory. If this array is already in device memory, than no copy is performed and the original object is returned.

Parameters:
  • synchronize – If True (by default) it blocks and waits for copy from host to device to be finished, else not synchronization is executed and further synchronization needs to be done using cuda stream provided by e.g. __cuda_array_interface__.

  • cuda_stream – An optional cudaStream_t represented as a Python integer to copy host buffer to.

Returns:

Array object with content in device memory or None if copy could not be done.

property dtype#
property item_size#

Size of each element in bytes.

property ndim#
property precision#

Maximum number of significant bits in data type. Value 0 means that precision is equal to data type bit depth

property shape#
property size#

Number of elements this array holds.

property strides#

Strides of axes in bytes

to_dlpack(
self: nvidia.nvcomp.nvcomp_impl.Array,
cuda_stream: object = None,
) capsule#

Export the array with zero-copy conversion to a DLPack tensor.

Parameters:

cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in created Array.

Returns:

DLPack tensor which is encapsulated in a PyCapsule object.

as_array#

nvidia.nvcomp.as_array(
source: object,
cuda_stream: int = 0,
) nvidia.nvcomp.nvcomp_impl.Array#

Wraps an external buffer as an array and ties the buffer lifetime to the array

Parameters:
  • source – Input DLPack tensor which is encapsulated in a PyCapsule object or other object with __cuda_array_interface__, __array_interface__ or __dlpack__ and __dlpack_device__ methods.

  • cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in the created Array.

Returns:

nvcomp.Array

as_arrays#

nvidia.nvcomp.as_arrays(sources: list[object], cuda_stream: int = 0) list[object]#

Wraps all an external buffers as an arrays and ties the buffers lifetime to the arrays

Parameters:
  • sources – List of input DLPack tensors which is encapsulated in a PyCapsule objects or other objects with __cuda_array_interface__, __array_interface__ or __dlpack__ and __dlpack_device__ methods.

  • cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in created Array.

Returns:

List of nvcomp.Array’s

from_dlpack#

nvidia.nvcomp.from_dlpack(
source: object,
cuda_stream: int = 0,
) nvidia.nvcomp.nvcomp_impl.Array#

Zero-copy conversion from a DLPack tensor to a array.

Parameters:
  • source – Input DLPack tensor which is encapsulated in a PyCapsule object or other (array) object with __dlpack__ and __dlpack_device__ methods.

  • cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in created Array.

Returns:

nvcomp.Array

set_device_allocator#

nvidia.nvcomp.set_device_allocator(allocator: object = None) None#

Sets a new allocator to be used for future device allocations.

The signature of the allocator should be like in the following example:

def my_allocator(nbytes: int, stream: nvcomp.Stream) -> PtrProtocol:
    return MyBuffer(nbytes, stream)

PtrProtocol denotes any object that has a ptr attribute of integral type. This should be the pointer to the allocated buffer (represented as an integer).

In the signature, nbytes is the number of bytes in the requested buffer. stream is the CUDA stream on which to perform the allocation and/or deallocation if the allocator is stream-ordered. Non-stream-ordered allocators may ignore stream or may synchronize with it before deallocation, depending on the desired behavior. A separate allocation and deallocation stream are currently not supported.

The returned object should be such that, when it is deleted, either on account of there being no more valid Python references to it or because it was garbage collected, the memory gets deallocated. In a custom Python class, this may be achieved through the __del__ method. This is considered an advanced usage pattern, so the recommended approach is to compose pre-existing solutions from other libraries, such as cupy’s Memory classes and rmm’s DeviceBuffer.

It is generally allowed to set a new allocator while one or more buffers allocated by the previous allocator are still active. Individual allocator implementations may, however, choose to prohibit this.

If the deallocation requires accessing stream, the allocator should return an ExternalMemory instance wrapping the newly constructed memory object and the CudaStream argument. The memory object should, from then on, only be accessed through the ExternalMemory wrapper. This ensures that the stream is still alive when the memory is deallocated.

A simple but versatile example of a custom allocator is given by rmm_nvcomp_allocator.

The allocated memory must be device-accessible.

Parameters:

allocator – Callable satisfying the conditions above.

set_pinned_allocator#

nvidia.nvcomp.set_pinned_allocator(allocator: object = None) None#

Sets a new allocator to be used for future pinned host allocations.

Note that his should allocate pinned host memory. For non-pinned host memory, use set_host_allocator. It is not an error to allocate non-pinned host memory with this allocator but may lead to performance degradation.

The allocator must allocate host accessible memory. Other than that, the conditions on allocator are the same as in set_device_allocator, including stream semantics.

Parameters:

allocator – Callable satisfying the conditions above.

set_host_allocator#

nvidia.nvcomp.set_host_allocator(allocator: object = None) None#

Sets a new allocator to be used for future non-pinned host allocations.

This is primarily intended for potentially large allocations, such as those backing CPU Array instances. Moderately-sized internal host allocations may still use System Allocated Memory.

This should allocate non-pinned host memory. For pinned host memory, use set_pinned_allocator. It is not an error to allocate pinned host memory with this allocator but may lead to performance degradation.

The allocator must allocate host accessible memory. Other than that, the conditions on allocator are the same as in set_device_allocator, including stream semantics.

Parameters:

allocator – Callable satisfying the conditions above.