Python API#
This is the Python API reference for the NVIDIA® nvCOMP library.
Data Type Association#
nvCOMP type |
Python array-protocol type string |
Type description |
---|---|---|
|
|
Bit |
|
|
8-bit signed character |
|
|
8-bit unsigned character |
|
|
Byte |
|
|
Little-endian 2-byte signed integer |
|
|
Little-endian 2-byte unsigned integer |
|
|
Little-endian 4-byte signed integer |
|
|
Little-endian 4-byte unsigned integer |
|
|
Little-endian 8-byte signed integer |
|
|
Little-endian 8-byte unsigned integer |
|
|
Little-endian 2-byte float |
BitstreamKind#
- class nvidia.nvcomp.BitstreamKind#
Defines how buffer will be compressed in nvcomp
Members:
NVCOMP_NATIVE : Each input buffer is chunked according to manager setting and compressed in parallel. Allows computation of checksums. Adds custom header with nvCOMP metadata at the beginning of the compressed data.
RAW : Compresses input data as is, just using underlying compression algorithm. Does not add header with nvCOMP metadata.
WITH_UNCOMPRESSED_SIZE : Similar to RAW, but adds custom header with just uncompressed size at the beginning of the compressed data.
CudaStream#
- class nvidia.nvcomp.CudaStream#
Wrapper around a CUDA stream. Provides either shared-ownership or view semantics, depending on whether it was constructed through
borrow
ormake_new
, respectively.CudaStream
is the type of stream parameters passed to allocation functions that can be used withset_*_allocator
. If the deallocation of such memory needs to access the stream passed to the allocation function, the allocation function should return anExternalMemory
instance wrapping the newly constructed memory object and theCudaStream
argument. The memory object should, from then on, only be accessed through theExternalMemory
wrapper. This ensures that the stream is still alive when the memory is deallocated.It is not envisioned that
CudaStream
will be used outside allocation functions. Nevertheless,borrow
andmake_new
are provided for completeness.- static borrow(
- cuda_stream: int,
- device_idx: int = -1,
Create a stream view.
The device index is primarily intended for special CUDA streams (i.e., the default, legacy, and per-thread streams) whose device cannot be inferred from the stream value itself. By default, it is equal to -1, a special value whose meaning depends on whether
stream
is special or not. Ifstream
is special, the default value associates the shared stream with the current device. Otherwise, theCudaStream
will always be associated with the stream’s actual device. In this case, passing adevice_idx
that is neither the default value nor the stream’s actual device will raise an exception.- Parameters:
cuda_stream – The
cudaStream_t
to wrap, represented as a Python integer.device_idx – Optional index of the device with which to associate the borrowed stream. See function description for details. Equal to -1 by default.
- property device#
The device index associated with the stream.
- property is_special#
Whether the underlying stream is one of the special streams (default, legacy, or per-thread).
Note that passing a special stream to any CUDA API call will actually pass the current device’s corresponding special stream. It must therefore be ensured that the stream’s associated device, as given by
device
, is selected before using the stream. This is currently entirely the user’s responsibility.
- static make_new(
- device_idx: int = -1,
Create a new stream with shared ownership.
- Parameters:
device_idx – Optional index of the device with which to associate the newly created stream. By default equal to -1, a special value that represents the current device.
- property ptr#
The underlying
cudaStream_t
represented as a Python integer.The property name follows the convention of
cupy.Stream
and reflects the fact that acudaStream_t
is internally a pointer.
Codec#
- class nvidia.nvcomp.Codec#
- __init__(
- self: nvidia.nvcomp.nvcomp_impl.Codec,
- **kwargs,
Initialize codec.
- Parameters:
algorithm – An optional name of compression algorithm to use. By default it is empty and algorithm can be deducted during decoding.
device_id – An optional device id to execute decoding/encoding on. If not specified default device will be used.
cuda_stream – An optional cudaStream_t represented as a Python integer. By default internal cuda stream is created for given device id.
uncomp_chunk_size – An optional uncompressed data chunk size. By default it is 65536.
checksum_policy –
Defines strategy for computing and verification of checksum. By default
NO_COMPUTE_NO_VERIFY
is assumed.- LZ4 algorithm specific options:
data_type: An optional array-protocol type string for default data type to use.
- GDeflate algorithm specific options:
- algorithm_type: Compression algorithm type to use. Permitted values are:
0 : highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)
1 : high-throughput, low compression ratio (default)
2 : medium-throughput, medium compression ratio, beat Zlib level 1 on the compression ratio
3 : placeholder for further compression level support, will fall into
MEDIUM_COMPRESSION
at this point4 : lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio
5 : lowest-throughput, highest compression ratio
- Deflate algorithm specific options:
- algorithm_type: Compression algorithm type to use. Permitted values are:
0 : highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)
1 : high-throughput, low compression ratio (default)
2 : medium-throughput, medium compression ratio, beat Zlib level 1 on the compression ratio
3 : placeholder for further compression level support, will fall into
MEDIUM_COMPRESSION
at this point4 : lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio
5 : lowest-throughput, highest compression ratio
- Bitcomp algorithm specific options:
- algorithm_type: The type of Bitcomp algorithm used.
0 : Default algorithm, usually gives the best compression ratios
1 : “Sparse” algorithm, works well on sparse data (with lots of zeroes). and is usually a faster than the default algorithm.
data_type: An optional array-protocol type string for default data type to use.
- ANS algorithm specific options:
- data_type: An optional array-protocol type string for default data type to use. Permitted values are:
|u1
: For unsigned 8-bit integer<f2
: For 16-bit little-endian float. Requires uncomp_chunk_size to be multiple of 2
- Cascaded algorithm specific options:
data_type: An optional array-protocol type string for default data type to use.
num_rles: The number of Run Length Encodings to perform. By default equal to 2
num_deltas: The number of Delta Encodings to perform. By default equal to 1
use_bitpack: Whether or not to bitpack the final layers. By default it is True.
- decode(*args, **kwargs)#
Overloaded function.
decode(self: nvidia.nvcomp.nvcomp_impl.Codec, src: nvidia.nvcomp.nvcomp_impl.Array, data_type: str = ‘’) -> object
Executes decoding of data from a Array handle.
- Args:
src: Decode source object.
data_type: An optional array-protocol type string for output data type. By default it is equal to
|u1
.- Returns:
nvcomp.Array
decode(self: nvidia.nvcomp.nvcomp_impl.Codec, srcs: list[nvidia.nvcomp.nvcomp_impl.Array], data_type: str = ‘’) -> list[object]
Executes decoding from a batch of Array handles.
- Args:
srcs: List of Array objects
data_type: An optional array-protocol type string for output data type.
- Returns:
List of decoded nvcomp.Array’s
- encode(*args, **kwargs)#
Overloaded function.
encode(self: nvidia.nvcomp.nvcomp_impl.Codec, array_s: nvidia.nvcomp.nvcomp_impl.Array) -> object
Encode array.
- Args:
array: Array to encode
- Returns:
Encoded nvcomp.Array
encode(self: nvidia.nvcomp.nvcomp_impl.Codec, srcs: list[nvidia.nvcomp.nvcomp_impl.Array]) -> list[object]
Executes encoding from a batch of Array handles.
- Args:
srcs: List of Array objects
- Returns:
List of encoded nvcomp.Array’s
ArrayBufferKind#
- class nvidia.nvcomp.ArrayBufferKind#
Defines buffer kind in which array data is stored.
Members:
STRIDED_DEVICE : GPU-accessible in pitch-linear layout.
STRIDED_HOST : Host-accessible in pitch-linear layout.
Array#
- class nvidia.nvcomp.Array#
Class which wraps array. It can be decoded data or data to encode.
- property __cuda_array_interface__#
The CUDA array interchange interface compatible with Numba v0.39.0 or later (see CUDA Array Interface for details)
- __dlpack__(
- self: nvidia.nvcomp.nvcomp_impl.Array,
- stream: object = None,
Export the array as a DLPack tensor
- __dlpack_device__( ) tuple #
Get the device associated with the buffer
- property buffer_kind#
Buffer kind in which array data is stored.
- property buffer_size#
The total number of bytes to store the array.
- cpu(self: nvidia.nvcomp.nvcomp_impl.Array) object #
Returns a copy of this array in CPU memory. If this array is already in CPU memory, than no copy is performed and the original object is returned.
- Returns:
Array object with content in CPU memory or None if copy could not be done.
- cuda(
- self: nvidia.nvcomp.nvcomp_impl.Array,
- synchronize: bool = True,
- cuda_stream: int = 0,
Returns a copy of this array in device memory. If this array is already in device memory, than no copy is performed and the original object is returned.
- Parameters:
synchronize – If True (by default) it blocks and waits for copy from host to device to be finished, else not synchronization is executed and further synchronization needs to be done using cuda stream provided by e.g. __cuda_array_interface__.
cuda_stream – An optional cudaStream_t represented as a Python integer to copy host buffer to.
- Returns:
Array object with content in device memory or None if copy could not be done.
- property dtype#
- property item_size#
Size of each element in bytes.
- property ndim#
- property precision#
Maximum number of significant bits in data type. Value 0 means that precision is equal to data type bit depth
- property shape#
- property size#
Number of elements this array holds.
- property strides#
Strides of axes in bytes
- to_dlpack(
- self: nvidia.nvcomp.nvcomp_impl.Array,
- cuda_stream: object = None,
Export the array with zero-copy conversion to a DLPack tensor.
- Parameters:
cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in created Array.
- Returns:
DLPack tensor which is encapsulated in a PyCapsule object.
as_array#
- nvidia.nvcomp.as_array(
- source: object,
- cuda_stream: int = 0,
Wraps an external buffer as an array and ties the buffer lifetime to the array
- Parameters:
source – Input DLPack tensor which is encapsulated in a PyCapsule object or other object with __cuda_array_interface__, __array_interface__ or __dlpack__ and __dlpack_device__ methods.
cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in the created Array.
- Returns:
nvcomp.Array
as_arrays#
- nvidia.nvcomp.as_arrays(sources: list[object], cuda_stream: int = 0) list[object] #
Wraps all an external buffers as an arrays and ties the buffers lifetime to the arrays
- Parameters:
sources – List of input DLPack tensors which is encapsulated in a PyCapsule objects or other objects with __cuda_array_interface__, __array_interface__ or __dlpack__ and __dlpack_device__ methods.
cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in created Array.
- Returns:
List of nvcomp.Array’s
from_dlpack#
- nvidia.nvcomp.from_dlpack(
- source: object,
- cuda_stream: int = 0,
Zero-copy conversion from a DLPack tensor to a array.
- Parameters:
source – Input DLPack tensor which is encapsulated in a PyCapsule object or other (array) object with __dlpack__ and __dlpack_device__ methods.
cuda_stream – An optional cudaStream_t represented as a Python integer, upon which synchronization must take place in created Array.
- Returns:
nvcomp.Array
set_device_allocator#
- nvidia.nvcomp.set_device_allocator(allocator: object = None) None #
Sets a new allocator to be used for future device allocations.
The signature of the allocator should be like in the following example:
def my_allocator(nbytes: int, stream: nvcomp.Stream) -> PtrProtocol: return MyBuffer(nbytes, stream)
PtrProtocol
denotes any object that has aptr
attribute of integral type. This should be the pointer to the allocated buffer (represented as an integer).In the signature,
nbytes
is the number of bytes in the requested buffer.stream
is the CUDA stream on which to perform the allocation and/or deallocation if the allocator is stream-ordered. Non-stream-ordered allocators may ignorestream
or may synchronize with it before deallocation, depending on the desired behavior. A separate allocation and deallocation stream are currently not supported.The returned object should be such that, when it is deleted, either on account of there being no more valid Python references to it or because it was garbage collected, the memory gets deallocated. In a custom Python class, this may be achieved through the
__del__
method. This is considered an advanced usage pattern, so the recommended approach is to compose pre-existing solutions from other libraries, such as cupy’sMemory
classes and rmm’sDeviceBuffer
.It is generally allowed to set a new allocator while one or more buffers allocated by the previous allocator are still active. Individual allocator implementations may, however, choose to prohibit this.
If the deallocation requires accessing
stream
, the allocator should return anExternalMemory
instance wrapping the newly constructed memory object and theCudaStream
argument. The memory object should, from then on, only be accessed through theExternalMemory
wrapper. This ensures that the stream is still alive when the memory is deallocated.A simple but versatile example of a custom allocator is given by
rmm_nvcomp_allocator
.The allocated memory must be device-accessible.
- Parameters:
allocator – Callable satisfying the conditions above.
set_pinned_allocator#
- nvidia.nvcomp.set_pinned_allocator(allocator: object = None) None #
Sets a new allocator to be used for future pinned host allocations.
Note that his should allocate pinned host memory. For non-pinned host memory, use
set_host_allocator
. It is not an error to allocate non-pinned host memory with this allocator but may lead to performance degradation.The allocator must allocate host accessible memory. Other than that, the conditions on
allocator
are the same as inset_device_allocator
, including stream semantics.- Parameters:
allocator – Callable satisfying the conditions above.
set_host_allocator#
- nvidia.nvcomp.set_host_allocator(allocator: object = None) None #
Sets a new allocator to be used for future non-pinned host allocations.
This is primarily intended for potentially large allocations, such as those backing CPU Array instances. Moderately-sized internal host allocations may still use System Allocated Memory.
This should allocate non-pinned host memory. For pinned host memory, use
set_pinned_allocator
. It is not an error to allocate pinned host memory with this allocator but may lead to performance degradation.The allocator must allocate host accessible memory. Other than that, the conditions on
allocator
are the same as inset_device_allocator
, including stream semantics.- Parameters:
allocator – Callable satisfying the conditions above.