C API#

Generic#

Typedefs

typedef enum nvcompType_t nvcompType_t

Enums

enum nvcompType_t#

Values:

enumerator NVCOMP_TYPE_CHAR#
enumerator NVCOMP_TYPE_UCHAR#
enumerator NVCOMP_TYPE_SHORT#
enumerator NVCOMP_TYPE_USHORT#
enumerator NVCOMP_TYPE_INT#
enumerator NVCOMP_TYPE_UINT#
enumerator NVCOMP_TYPE_LONGLONG#
enumerator NVCOMP_TYPE_ULONGLONG#
enumerator NVCOMP_TYPE_UINT8#
enumerator NVCOMP_TYPE_FLOAT16#
enumerator NVCOMP_TYPE_BITS#

Functions

nvcompStatus_t nvcompGetProperties(nvcompProperties_t *properties)#

Provides nvCOMP library properties.

Parameters:

properties[out] Set nvCOMP properties in nvcompProperties_t handle.

Returns:

nvcompErrorInvalidValue is properties is nullptr, nvcompSuccess otherwise

nvcompStatus_t nvcompDecompressGetTempSize(const void *metadata_ptr, size_t *temp_bytes)#

Computes the required temporary workspace required to perform decompression.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • metadata_ptr – The metadata.

  • temp_bytes – The size of the required temporary workspace in bytes (output).

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompDecompressGetOutputSize(const void *metadata_ptr, size_t *output_bytes)#

Computes the size of the uncompressed data in bytes.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • metadata_ptr – The metadata.

  • output_bytes – The size of the uncompressed data (output).

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompDecompressGetType(const void *metadata_ptr, nvcompType_t *type)#

Get the type of the compressed data.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • metadata_ptr – The metadata.

  • type – The data type (output).

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompDecompressAsync(const void *in_ptr, size_t in_bytes, void *temp_ptr, size_t temp_bytes, void *metadata_ptr, void *out_ptr, size_t out_bytes, cudaStream_t stream)#

Perform the asynchronous decompression.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • in_ptr – The compressed data on the device to decompress.

  • in_bytes – The size of the compressed data.

  • temp_ptr – The temporary workspace on the device.

  • temp_bytes – The size of the temporary workspace.

  • metadata_ptr – The metadata.

  • out_ptr – The output location on the device.

  • out_bytes – The size of the output location.

  • stream – The cuda stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

struct nvcompProperties_t#
#include <nvcomp.h>

nvCOMP properties.

Public Members

uint32_t version#

nvCOMP library version.

uint32_t cudart_version#

Version of CUDA Runtime with which nvCOMP library was built.

Typedefs

typedef enum nvcompStatus_t nvcompStatus_t

Enums

enum nvcompStatus_t#

Values:

enumerator nvcompSuccess#
enumerator nvcompErrorInvalidValue#
enumerator nvcompErrorNotSupported#
enumerator nvcompErrorCannotDecompress#
enumerator nvcompErrorBadChecksum#
enumerator nvcompErrorCannotVerifyChecksums#
enumerator nvcompErrorOutputBufferTooSmall#
enumerator nvcompErrorWrongHeaderLength#
enumerator nvcompErrorAlignment#
enumerator nvcompErrorChunkSizeTooLarge#
enumerator nvcompErrorCudaError#
enumerator nvcompErrorInternal#

Note

nvcompBatched<compression_method>CompressGetTempSizeEx APIs are provided to allow the user to provide max_total_uncompressed_bytes, otherwise it is assumed that all chunks are of size max_uncompressed_chunk_bytes which can lead to an overestimate in temporary memory requirements.

CRC32#

Functions

nvcompStatus_t nvcompBatchedCRC32Async(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t num_chunks, uint32_t *device_CRC32_ptr, cudaStream_t stream)#

Perform CRC32 checksum calculation asynchronously. All pointers must point to GPU accessible locations.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • num_chunks[in] The number of chunks to compute checksums of.

  • device_CRC32_ptr[out] Array with size num_chunks on the GPU to be filled with the CRC32 checksum of each chunk.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

LZ4#

Functions

nvcompStatus_t nvcompBatchedLZ4CompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedLZ4Opts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The LZ4 compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedLZ4Opts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The LZ4 compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedLZ4Opts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedLZ4CompressAsync() for each chunk.

Chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The LZ4 compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedLZ4Opts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

The individual chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size MUST be a multiple of the size of the data type specified by format_opts.data_type, else this may crash or produce invalid output.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk. This parameter is currently unused, so if it is not set with the maximum size, it should be set to zero. If a future version makes use of it, it will return an error if it is set to zero.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by device_temp_ptr.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedLZ4CompressGetMaxOutputChunkSize.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The LZ4 compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4DecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4DecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in LZ4.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4GetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size. NOTE: If the stream is corrupt, the sizes will be garbage.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. This argument needs to be prealloated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4DecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

In the case where a chunk of compressed data is not a valid LZ4 block, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedLZ4Opts_t nvcompBatchedLZ4DefaultOpts = {NVCOMP_TYPE_CHAR}#
const size_t nvcompLZ4CompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompLZ4RequiredAlignment = 4#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompLZ4FormatOpts#
#include <lz4.h>

Structure for configuring LZ4 compression.

Public Members

size_t chunk_size#

The size of each chunk of data to decompress indepentently with LZ4. Must be within the range of [32768, 16777216]. Larger sizes will result in higher compression, but with decreased parallelism. The recommended size is 65536.

struct nvcompBatchedLZ4Opts_t#
#include <lz4.h>

LZ4 compression options for the low-level API

Public Members

nvcompType_t data_type#

Snappy#

Functions

nvcompStatus_t nvcompBatchedSnappyCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedSnappyOpts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Snappy compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedSnappyOpts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Snappy compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedSnappyOpts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedSnappyCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] Snappy compression options.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedSnappyOpts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

The caller is responsible for passing device_compressed_chunk_bytes of size sufficient to hold compressed data

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace, could be NULL in case temporary memory is not needed.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by device_temp_ptr.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedSnappyCompressGetMaxOutputChunkSize.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] Snappy compression options.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in Snappy.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be prealloated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space, could be NULL in case temporary space is not needed.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedSnappyOpts_t nvcompBatchedSnappyDefaultOpts = {0}#
const size_t nvcompSnappyCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompSnappyRequiredAlignment = 1#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

The Snappy compressor supports unaligned data, so this value is 1.

struct nvcompBatchedSnappyOpts_t#
#include <snappy.h>

Snappy compression options for the low-level API.

Public Members

int reserved#

Deflate#

Functions

nvcompStatus_t nvcompBatchedDeflateCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedDeflateOpts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Deflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedDeflateOpts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Deflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedDeflateOpts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedDeflateCompressAsync() for each chunk.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Deflate compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedDeflateOpts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

The individual chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended. The output buffers must be 8-byte aligned.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by device_temp_ptr.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedDeflateCompressGetMaxOutputChunkSize.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The Deflate compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in Deflate.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size. NOTE: If the stream is corrupt, the sizes will be garbage.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

In the case where a chunk of compressed data is not a valid Deflate stream, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedDeflateOpts_t nvcompBatchedDeflateDefaultOpts = {1}#
const size_t nvcompDeflateCompressionMaxAllowedChunkSize = 1 << 16#
const size_t nvcompDeflateRequiredAlignment = 8#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompBatchedDeflateOpts_t#
#include <deflate.h>

Deflate compression options for the low-level API

Public Members

int algo#

Compression algorithm to use. Permitted values are:

  • 1: high-throughput, low compression ratio (default)

  • 2: medium-througput, medium compression ratio, beat Zlib level 1 on the compression ratio

  • 3: placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point

  • 4: lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio

  • 5: lowest-throughput, highest compression ratio

GDeflate#

Functions

nvcompStatus_t nvcompBatchedGdeflateCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedGdeflateOpts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The GDeflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedGdeflateOpts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The GDeflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedGdeflateOpts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedGdeflateCompressAsync() for each chunk.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The GDeflate compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedGdeflateOpts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

The individual chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended. The output buffers must be 8-byte aligned.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by device_temp_ptr.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedGdeflateCompressGetMaxOutputChunkSize.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The GDeflate compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in GDeflate.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size. NOTE: If the stream is corrupt, the sizes will be garbage.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

In the case where a chunk of compressed data is not a valid GDeflate stream, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedGdeflateOpts_t nvcompBatchedGdeflateDefaultOpts = {1}#
const size_t nvcompGdeflateCompressionMaxAllowedChunkSize = 1 << 16#
const size_t nvcompGdeflateRequiredAlignment = 8#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompBatchedGdeflateOpts_t#
#include <gdeflate.h>

GDeflate compression options for the low-level API

Public Members

int algo#

Compression algorithm to use. Permitted values are:

  • 0: highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)

  • 1: high-throughput, low compression ratio (default)

  • 2: medium-througput, medium compression ratio, beat Zlib level 1 on the compression ratio

  • 3: placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point

  • 4: lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio

  • 5: lowest-throughput, highest compression ratio

ZSDT#

Functions

nvcompStatus_t nvcompBatchedZstdCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedZstdOpts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The ZSTD compression options to use &#8212; currently empty

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedZstdOpts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

This extended API is useful for cases where chunk sizes aren’t uniform in the batch I.e. in the non-extended API, if all but 1 chunk is 64 KB, but 1 chunk is 16 MB, the temporary space computed is based on 16 MB * num_chunks.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The ZSTD compression options to use. Currently empty.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedZstdOpts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedZstdCompressAsync() for each chunk.

Chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Zstd compression options to use. Currently empty.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedZstdOpts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

The individual chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk. This parameter is currently unused, so if it is not set with the maximum size, it should be set to zero. If a future version makes use of it, it will return an error if it is set to zero.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace, could be NULL in case temporary memory is not needed.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by device_temp_ptr.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedZstdCompressGetMaxOutputChunkSize.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The Zstd compression options to use. Currently empty.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be prealloated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space, could be NULL in case temporary space is not needed.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedZstdOpts_t nvcompBatchedZstdDefaultOpts = {0}#
const size_t nvcompZstdCompressionMaxAllowedChunkSize = (1UL << 31) - 1#
const size_t nvcompZstdRequiredAlignment = 8#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompBatchedZstdOpts_t#
#include <zstd.h>

Zstd compression options for the low-level API.

Public Members

int reserved#

GZIP#

Functions

nvcompStatus_t nvcompBatchedGzipDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in gzip.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size. NOTE: If the stream is corrupt, the sizes will be garbage.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

In the case where a chunk of compressed data is not a valid gzip stream, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

ANS#

Typedefs

typedef enum nvcompANSType_t nvcompANSType_t
typedef enum nvcompANSDataType_t nvcompANSDataType_t

Enums

enum nvcompANSType_t#

Values:

enumerator nvcomp_rANS#
enum nvcompANSDataType_t#

Values:

enumerator uint8#
enumerator float16#

Functions

nvcompStatus_t nvcompBatchedANSCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedANSOpts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedANSOpts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedANSOpts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedANSCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] Compression options.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedANSOpts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

The caller is responsible for passing device_compressed_chunk_bytes of size sufficient to hold compressed data

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each pointer must be aligned to an 8-byte boundary.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace, could be NULL in case temporary memory is not needed.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by device_temp_ptr.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedANSCompressGetMaxOutputChunkSize. Each pointer must be aligned to an 8-byte boundary.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] Compression options.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in ANS.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be prealloated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

NOTE: This function is used to decompress compressed buffers produced by nvcompBatchedANSCompressAsync.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory and start at a location with 8-byte alignment.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space, could be NULL in case temporary space is not needed.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and start at a location with 8-byte alignment.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedANSOpts_t nvcompBatchedANSDefaultOpts = {nvcomp_rANS, uint8}#
const size_t nvcompANSCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompANSRequiredAlignment = 8#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompBatchedANSOpts_t#
#include <ans.h>

ANS compression options for the low-level API.

Public Members

nvcompANSType_t type#
nvcompANSDataType_t data_type#

Bitcomp#

Functions

nvcompStatus_t nvcompBatchedBitcompCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedBitcompFormatOpts format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

NOTE: Bitcomp currently doesn’t use any temp memory.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedBitcompFormatOpts format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

NOTE: Bitcomp currently doesn’t use any temp memory.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedBitcompFormatOpts format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedBitcompCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] Compression options.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedBitcompFormatOpts format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

NOTE: The maximum number of chunks allowed is 2^31.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. The uncompressed data must start at locations with alignments of the data type size.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size MUST be a multiple of the size of the data type specified by format_opts.data_type, else this may crash or produce invalid output.

  • max_uncompressed_chunk_bytes[in] This argument is not used.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedBitcompCompressGetMaxOutputChunkSize. Each compressed buffer should start at a location with 8-byte alignment.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] Compression options. They must be valid.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

NOTE: Bitcomp currently doesn’t use any temp memory.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

NOTE: Bitcomp currently doesn’t use any temp memory.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression. Unused in Bitcomp.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks.

nvcompStatus_t nvcompBatchedBitcompGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] This argument is not used.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be prealloated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

NOTE: This function is used to decompress compressed buffers produced by nvcompBatchedBitcompCompressAsync. It can also decompress buffers compressed with the standalone Bitcomp library.

NOTE: The function is not completely asynchronous, as it needs to look at the compressed data in order to create the proper bitcomp handle. The stream is synchronized, the data is examined, then the asynchronous decompression is launched.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory and start at a location with 8-byte alignment.

  • device_compressed_chunk_bytes[in] This argument is not used.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedBitcompFormatOpts nvcompBatchedBitcompDefaultOpts = {0, NVCOMP_TYPE_UCHAR}#
const size_t nvcompBitcompCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompBitcompRequiredAlignment = 8#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompBatchedBitcompFormatOpts#
#include <bitcomp.h>

Structure for configuring Bitcomp compression.

Public Members

int algorithm_type#

Bitcomp algorithm options.

  • 0 : Default algorithm, usually gives the best compression ratios

  • 1 : “Sparse” algorithm, works well on sparse data (with lots of zeroes). and is usually a faster than the default algorithm.

nvcompType_t data_type#

One of nvcomp’s possible data types.

Cascaded#

Functions

nvcompStatus_t nvcompBatchedCascadedCompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedCascadedOpts_t format_opts, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for compression.

Note

Batched Cascaded compression does not require temp space, so this will set *temp_bytes=0, unless an error is found with the format_opts.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Cascaded compression options and datatype to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, nvcompBatchedCascadedOpts_t format_opts, size_t *temp_bytes, const size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Note

Batched Cascaded compression does not require temp space, so this will set *temp_bytes=0, unless an error is found with the format_opts.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Cascaded compression options and datatype to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressGetMaxOutputChunkSize(size_t max_uncompressed_chunk_bytes, nvcompBatchedCascadedOpts_t format_opts, size_t *max_compressed_chunk_bytes)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedCascadedCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Cascaded compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressAsync(const void *const *device_uncompressed_chunk_ptrs, const size_t *device_uncompressed_chunk_bytes, size_t max_uncompressed_chunk_bytes, size_t num_chunks, void *device_temp_ptr, size_t temp_bytes, void *const *device_compressed_chunk_ptrs, size_t *device_compressed_chunk_bytes, nvcompBatchedCascadedOpts_t format_opts, cudaStream_t stream)#

Perform batched asynchronous compression.

Note

The current implementation does not support uncompressed size larger than 4,294,967,295 bytes (max uint32_t).

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. The uncompressed data must start at locations with alignments of the data type size.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size MUST be a multiple of the size of the data type specified by format_opts.type, else this may crash or produce invalid output.

  • max_uncompressed_chunk_bytes[in] This argument is not used.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by nvcompBatchedCascadedCompressGetMaxOutputChunkSize. Each compressed buffer should start at a location with alignment of both 4B and the data type.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The cascaded format options. The format must be valid.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedDecompressGetTempSize(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedDecompressGetTempSizeEx(size_t num_chunks, size_t max_uncompressed_chunk_bytes, size_t *temp_bytes, size_t max_total_uncompressed_bytes)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in Cascaded.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedGetDecompressSizeAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, cudaStream_t stream)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be prealloated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedDecompressAsync(const void *const *device_compressed_chunk_ptrs, const size_t *device_compressed_chunk_bytes, const size_t *device_uncompressed_buffer_bytes, size_t *device_uncompressed_chunk_bytes, size_t num_chunks, void *const device_temp_ptr, size_t temp_bytes, void *const *device_uncompressed_chunk_ptrs, nvcompStatus_t *device_statuses, cudaStream_t stream)#

Perform batched asynchronous decompression.

Note

This function is used to decompress compressed buffers produced by nvcompBatchedCascadedCompressAsync.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each compressed buffer should reside in device-accessible memory and start at a location with alignment of both 4B and the data type.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to nvcompErrorCannotDecompress.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and start at a location with alignment of the data type.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to nvcompSuccess. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to nvcompErrorCannotDecompress.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedCascadedOpts_t nvcompBatchedCascadedDefaultOpts = {4096, NVCOMP_TYPE_INT, 2, 1, 1}#
const size_t nvcompCascadedCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompCascadedRequiredAlignment = 8#

This is the minimum alignment required for void type CUDA memory buffers passed to compression or decompression functions. Typed memory buffers must still be aligned to their type’s size, e.g. 8 bytes for size_t.

struct nvcompCascadedFormatOpts#
#include <cascaded.h>

Structure that stores the compression configuration.

Public Members

int num_RLEs#

The number of Run Length Encodings to perform.

int num_deltas#

The number of Delta Encodings to perform.

int use_bp#

Whether or not to bitpack the final layers.

struct nvcompBatchedCascadedOpts_t#
#include <cascaded.h>

Structure that stores the compression configuration.

Public Members

size_t internal_chunk_bytes#

The size of each internal chunk of data to decompress indepentently with.

Cascaded compression. The value should be in the range of [512, 16384] depending on the datatype of the input and the shared memory size of the GPU being used. This is not the size of chunks passed into the API. Recommended size is 4096.

Note

Not currently used and a default of 4096 is just used.

nvcompType_t type#

The datatype used to define the bit-width for compression.

int num_RLEs#

The number of Run Length Encodings to perform.

int num_deltas#

The number of Delta Encodings to perform.

int use_bp#

Whether or not to bitpack the final layers.