C API#

This is the C API reference for the NVIDIA® nvCOMP library.

Generic#

Defines

NVCOMP_HOST_DEVICE_FUNCTION#

Enums

enum nvcompType_t#

Values:

enumerator NVCOMP_TYPE_CHAR#
enumerator NVCOMP_TYPE_UCHAR#
enumerator NVCOMP_TYPE_SHORT#
enumerator NVCOMP_TYPE_USHORT#
enumerator NVCOMP_TYPE_INT#
enumerator NVCOMP_TYPE_UINT#
enumerator NVCOMP_TYPE_LONGLONG#
enumerator NVCOMP_TYPE_ULONGLONG#
enumerator NVCOMP_TYPE_UINT8#
enumerator NVCOMP_TYPE_FLOAT16#
enumerator NVCOMP_TYPE_BITS#

Functions

nvcompStatus_t nvcompGetProperties(nvcompProperties_t *properties)#

Provides nvCOMP library properties.

Parameters:

properties[out] Set nvCOMP properties in nvcompProperties_t handle.

Returns:

nvcompErrorInvalidValue is properties is nullptr, nvcompSuccess otherwise

nvcompStatus_t nvcompDecompressGetTempSize(
const void *metadata_ptr,
size_t *temp_bytes,
)#

Computes the required temporary workspace required to perform decompression.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • metadata_ptr – The metadata.

  • temp_bytes – The size of the required temporary workspace in bytes (output).

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompDecompressGetOutputSize(
const void *metadata_ptr,
size_t *output_bytes,
)#

Computes the size of the uncompressed data in bytes.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • metadata_ptr – The metadata.

  • output_bytes – The size of the uncompressed data (output).

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompDecompressGetType(
const void *metadata_ptr,
nvcompType_t *type,
)#

Get the type of the compressed data.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • metadata_ptr – The metadata.

  • type – The data type (output).

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompDecompressAsync(
const void *in_ptr,
size_t in_bytes,
void *temp_ptr,
size_t temp_bytes,
void *metadata_ptr,
void *out_ptr,
size_t out_bytes,
cudaStream_t stream,
)#

Perform the asynchronous decompression.

Deprecated:

This interface is deprecated and will be removed in future releases, please switch to the compression schemes specific interfaces in nvcomp/cascaded.h, nvcomp/lz4.h, nvcomp/snappy, nvcomp/bitcomp.h, nvcomp/gdeflate.h, nvcomp/zstd.h, nvcomp/deflate.h and nvcomp/ans.h.

Parameters:
  • in_ptr – The compressed data on the device to decompress.

  • in_bytes – The size of the compressed data.

  • temp_ptr – The temporary workspace on the device.

  • temp_bytes – The size of the temporary workspace.

  • metadata_ptr – The metadata.

  • out_ptr – The output location on the device.

  • out_bytes – The size of the output location.

  • stream – The cuda stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

struct nvcompProperties_t#
#include <nvcomp.h>

nvCOMP properties.

Public Members

uint32_t version#

nvCOMP library version.

uint32_t cudart_version#

Version of CUDA Runtime with which nvCOMP library was built.

Enums

enum nvcompStatus_t#

Values:

enumerator nvcompSuccess#
enumerator nvcompErrorInvalidValue#
enumerator nvcompErrorNotSupported#
enumerator nvcompErrorCannotDecompress#
enumerator nvcompErrorBadChecksum#
enumerator nvcompErrorCannotVerifyChecksums#
enumerator nvcompErrorOutputBufferTooSmall#
enumerator nvcompErrorWrongHeaderLength#
enumerator nvcompErrorAlignment#
enumerator nvcompErrorChunkSizeTooLarge#
enumerator nvcompErrorCudaError#
enumerator nvcompErrorInternal#
struct nvcompAlignmentRequirements_t#
#include <shared_types.h>

Per-algorithm buffer alignment requirements.

Public Members

size_t input#

Minimum alignment requirement of each input buffer.

size_t output#

Minimum alignment requirement of each output buffer.

size_t temp#

Minimum alignment requirement of temporary-storage buffer, if any. For algorithms that do not use temporary storage, this field is always equal to 1.

Note

nvcompBatched<compression_method>CompressGetTempSizeEx APIs are provided to allow the user to provide max_total_uncompressed_bytes, otherwise it is assumed that all chunks are of size max_uncompressed_chunk_bytes which can lead to an overestimate in temporary memory requirements.

CRC32#

Functions

nvcompStatus_t nvcompBatchedCRC32Async(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
uint32_t *device_CRC32_ptr,
cudaStream_t stream,
)#

Perform CRC32 checksum calculation asynchronously. All pointers must point to GPU accessible locations.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • num_chunks[in] The number of chunks to compute checksums of.

  • device_CRC32_ptr[out] Array with size num_chunks on the GPU to be filled with the CRC32 checksum of each chunk.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

LZ4#

Functions

nvcompStatus_t nvcompBatchedLZ4CompressGetRequiredAlignments(
nvcompBatchedLZ4Opts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedLZ4Opts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The LZ4 compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedLZ4Opts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The LZ4 compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedLZ4Opts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedLZ4CompressAsync() for each chunk.

Chunk size must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The LZ4 compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4CompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedLZ4Opts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size must be a multiple of the size of the data type specified by format_opts.data_type. Chunk sizes must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the same format_opts.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedLZ4CompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The LZ4 compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4DecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4DecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in LZ4.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4GetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size.

Note

If the stream is corrupt, the calculated sizes will be invalid.

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedLZ4DecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. This argument needs to be preallocated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedLZ4DecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

In the case where a chunk of compressed data is not a valid LZ4 block, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedLZ4DecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space. Must be aligned to the value in `nvcompBatchedLZ4DecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedLZ4DecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedLZ4Opts_t nvcompBatchedLZ4DefaultOpts = {NVCOMP_TYPE_CHAR}#
const size_t nvcompLZ4CompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompLZ4RequiredAlignment = 4#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedLZ4DecompressRequiredAlignments{1, 1, 1}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedLZ4Opts_t#
#include <lz4.h>

LZ4 compression options for the low-level API

Public Members

nvcompType_t data_type#

Snappy#

Functions

nvcompStatus_t nvcompBatchedSnappyCompressGetRequiredAlignments(
nvcompBatchedSnappyOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedSnappyOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • format_opts[in] Snappy compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedSnappyOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • format_opts[in] Snappy compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedSnappyOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedSnappyCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] Snappy compression options.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedSnappyOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk. This parameter is currently unused. Set it to either the actual value or zero.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace, could be NULL in case temporary memory is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the same format_opts.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedSnappyCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] Snappy compression options.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in Snappy.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedSnappyDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedSnappyDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedSnappyDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space, could be NULL in case temporary space is not needed. Must be aligned to the value in `nvcompBatchedSnappyDecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedSnappyDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedSnappyOpts_t nvcompBatchedSnappyDefaultOpts = {0}#
const size_t nvcompSnappyCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompSnappyRequiredAlignment = 1#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedSnappyDecompressRequiredAlignments{1, 1, 1}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedSnappyOpts_t#
#include <snappy.h>

Snappy compression options for the low-level API.

Public Members

int reserved#

Deflate#

Functions

nvcompStatus_t nvcompBatchedDeflateCompressGetRequiredAlignments(
nvcompBatchedDeflateOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedDeflateOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Deflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedDeflateOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Deflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedDeflateOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedDeflateCompressAsync() for each chunk.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Deflate compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedDeflateOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Chunk sizes must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the same format_opts.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedDeflateCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The Deflate compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in Deflate.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size.

Note

If the stream is corrupt, the calculated sizes will be invalid.

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedDeflateDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedDeflateDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

In the case where a chunk of compressed data is not a valid Deflate stream, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedDeflateDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space. Must be aligned to the value in `nvcompBatchedDeflateDecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedDeflateDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedDeflateOpts_t nvcompBatchedDeflateDefaultOpts = {1}#
const size_t nvcompDeflateCompressionMaxAllowedChunkSize = 1u << 31#

Although chunk sizes up to 2GB are theoretically possible, compression with large chunks may be very slow or use large amounts of temporary memory, so caution is advised when using chunk sizes above 64KB.

const size_t nvcompDeflateRequiredAlignment = 8#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedDeflateDecompressRequiredAlignments{4, 1, 1}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedDeflateOpts_t#
#include <deflate.h>

Deflate compression options for the low-level API

Public Members

int algo#

Compression algorithm to use. Permitted values are:

  • 0: highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)

  • 1: high-throughput, low compression ratio (default)

  • 2: medium-througput, medium compression ratio, beat Zlib level 1 on the compression ratio

  • 3: placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point

  • 4: lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio

  • 5: lowest-throughput, highest compression ratio

GDeflate#

Functions

nvcompStatus_t nvcompBatchedGdeflateCompressGetRequiredAlignments(
nvcompBatchedGdeflateOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedGdeflateOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The GDeflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedGdeflateOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The GDeflate compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedGdeflateOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedGdeflateCompressAsync() for each chunk.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The GDeflate compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedGdeflateOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Chunk sizes must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the same format_opts.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedGdeflateCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The GDeflate compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in GDeflate.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size.

Note

If the stream is corrupt, the calculated sizes will be invalid.

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedGdeflateDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGdeflateDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

In the case where a chunk of compressed data is not a valid GDeflate stream, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedGdeflateDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space. Must be aligned to the value in `nvcompBatchedGdeflateDecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedGdeflateDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedGdeflateOpts_t nvcompBatchedGdeflateDefaultOpts = {1}#
const size_t nvcompGdeflateCompressionMaxAllowedChunkSize = 1u << 31#

Although chunk sizes up to 2GB are theoretically possible, compression with large chunks may be very slow or use large amounts of temporary memory, so caution is advised when using chunk sizes above 64KB.

const size_t nvcompGdeflateRequiredAlignment = 8#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedGdeflateDecompressRequiredAlignments{4, 1, 1}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedGdeflateOpts_t#
#include <gdeflate.h>

GDeflate compression options for the low-level API

Public Members

int algo#

Compression algorithm to use. Permitted values are:

  • 0: highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)

  • 1: high-throughput, low compression ratio (default)

  • 2: medium-througput, medium compression ratio, beat Zlib level 1 on the compression ratio

  • 3: placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point

  • 4: lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio

  • 5: lowest-throughput, highest compression ratio

ZSTD#

Functions

nvcompStatus_t nvcompBatchedZstdCompressGetRequiredAlignments(
nvcompBatchedZstdOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedZstdOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The ZSTD compression options to use – currently empty

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedZstdOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

This extended API is useful for cases where chunk sizes aren’t uniform in the batch I.e. in the non-extended API, if all but 1 chunk is 64 KB, but 1 chunk is 16 MB, the temporary space computed is based on 16 MB * num_chunks.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The ZSTD compression options to use. Currently empty.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedZstdOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedZstdCompressAsync() for each chunk.

Chunk size must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Zstd compression options to use. Currently empty.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedZstdOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Chunk sizes must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace, could be NULL in case temporary memory is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the same format_opts.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedZstdCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The Zstd compression options to use. Currently empty.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedZstdDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedZstdDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedZstdDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space, could be NULL in case temporary space is not needed. Must be aligned to the value in `nvcompBatchedZstdDecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedZstdDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedZstdOpts_t nvcompBatchedZstdDefaultOpts = {0}#
const size_t nvcompZstdCompressionMaxAllowedChunkSize = (1UL << 31) - 1#
const size_t nvcompZstdRequiredAlignment = 8#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedZstdDecompressRequiredAlignments{1, 1, 8}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedZstdOpts_t#
#include <zstd.h>

Zstd compression options for the low-level API.

Public Members

int reserved#

GZIP#

Functions

nvcompStatus_t nvcompBatchedGzipCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedGzipOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Gzip compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedGzipOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] The Gzip compression options to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedGzipOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedGzipCompressAsync() for each chunk.

Chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Gzip compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedGzipOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

The individual chunk size must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended. The output buffers must be 8-byte aligned.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedGzipCompressGetMaxOutputChunkSize`.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The Gzip compression options to use.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in gzip.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

This is needed when we do not know the expected output size.

Note

If the stream is corrupt, the calculated sizes will be invalid.

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedGzipDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedGzipDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

In the case where a chunk of compressed data is not a valid Deflate stream, 0 will be written for the size of the invalid chunk and nvcompStatusCannotDecompress will be flagged for that chunk.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedGzipDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be nullptr if desired, in which case the actual sizes are not reported.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space. Must be aligned to the value in `nvcompBatchedGzipDecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedGzipDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`. Can be nullptr if desired, in which case error status is not reported.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

const nvcompAlignmentRequirements_t nvcompBatchedGzipDecompressRequiredAlignments{1, 1, 1}#

Minimum buffer alignment requirements for decompression.

static const nvcompBatchedGzipOpts_t nvcompBatchedGzipDefaultOpts = {0}#
struct nvcompBatchedGzipOpts_t#
#include <gzip.h>

Gzip compression options for the low-level API

Public Members

int reserved#

ANS#

Enums

enum nvcompANSType_t#

Values:

enumerator nvcomp_rANS#
enum nvcompANSDataType_t#

Values:

enumerator uint8#
enumerator float16#

Functions

nvcompStatus_t nvcompBatchedANSCompressGetRequiredAlignments(
nvcompBatchedANSOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedANSOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedANSOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedANSOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedANSCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] Compression options.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedANSOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] The temporary GPU workspace, could be NULL in case temporary memory is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the same format_opts.

  • temp_bytes[in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedANSCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] Compression options.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in ANS.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedANSDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedANSDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

This function is used to decompress compressed buffers produced by `nvcompBatchedANSCompressAsync`.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedANSDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] The temporary GPU space, could be NULL in case temporary space is not needed. Must be aligned to the value in `nvcompBatchedANSDecompressRequiredAlignments.temp`.

  • temp_bytes[in] The size of the temporary GPU space.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedANSDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedANSOpts_t nvcompBatchedANSDefaultOpts = {nvcomp_rANS, uint8}#
const size_t nvcompANSCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompANSRequiredAlignment = 8#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedANSDecompressRequiredAlignments{8, 1, 4}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedANSOpts_t#
#include <ans.h>

ANS compression options for the low-level API.

Public Members

nvcompANSType_t type#
nvcompANSDataType_t data_type#

Bitcomp#

Typedefs

typedef nvcompBatchedBitcompOpts_t nvcompBatchedBitcompFormatOpts#

Legacy alias for nvcompBatchedBitcompOpts_t.

Functions

nvcompStatus_t nvcompBatchedBitcompCompressGetRequiredAlignments(
nvcompBatchedBitcompOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedBitcompOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

NOTE: Bitcomp currently doesn’t use any temp memory for Compression.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedBitcompOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

NOTE: Bitcomp currently doesn’t use any temp memory.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • format_opts[in] Compression options.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedBitcompOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedBitcompCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] Compression options.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedBitcompOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size must be a multiple of the size of the data type specified by format_opts.data_type.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedBitcompCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] Compression options. They must be valid.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

NOTE: Bitcomp currently doesn’t use any temp memory.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

NOTE: From 4.2, Bitcomp now uses temporary memory for Decompression, to prevent Device wide synchronizations that were occuring earlier.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression. Unused in Bitcomp.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks.

nvcompStatus_t nvcompBatchedBitcompGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedBitcompDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] This argument is not used.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedBitcompDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

This function is used to decompress compressed buffers produced by `nvcompBatchedBitcompCompressAsync`. It can also decompress buffers compressed with the standalone Bitcomp library.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

The function is not completely asynchronous, as it needs to look at the compressed data in order to create the proper bitcomp handle. The stream is synchronized, the data is examined, then the asynchronous decompression is launched.

An asynchronous, faster version of batched Bitcomp asynchrnous decompression is available, and can be launched via the HLIF manager.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedBitcompDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] This argument is not used.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] Temporary scratch memory.

  • temp_bytes[in] Size of temporary scratch memory.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedBitcompDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedBitcompOpts_t nvcompBatchedBitcompDefaultOpts = {0, NVCOMP_TYPE_UCHAR}#
const size_t nvcompBitcompCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompBitcompRequiredAlignment = 8#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedBitcompDecompressRequiredAlignments{4, 8, 4}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedBitcompOpts_t#
#include <bitcomp.h>

Structure for configuring Bitcomp compression.

Public Members

int algorithm_type#

Bitcomp algorithm options.

  • 0 : Default algorithm, usually gives the best compression ratios

  • 1 : “Sparse” algorithm, works well on sparse data (with lots of zeroes) and is usually faster than the default algorithm.

nvcompType_t data_type#

One of nvcomp’s possible data types.

Cascaded#

Functions

nvcompStatus_t nvcompBatchedCascadedCompressGetRequiredAlignments(
nvcompBatchedCascadedOpts_t format_opts,
nvcompAlignmentRequirements_t *alignment_requirements,
)#

Get the minimum buffer alignment requirements for compression.

Parameters:
  • format_opts[in] Compression options.

  • alignment_requirements[out] The minimum buffer alignment requirements for compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedCascadedOpts_t format_opts,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for compression.

Note

Batched Cascaded compression does not require temp space, so this will set *temp_bytes=0, unless an error is found with the format_opts.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • format_opts[in] The Cascaded compression options and datatype to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
nvcompBatchedCascadedOpts_t format_opts,
size_t *temp_bytes,
const size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for compression with extra total bytes argument.

Note

Batched Cascaded compression does not require temp space, so this will set *temp_bytes=0, unless an error is found with the format_opts.

Parameters:
  • num_chunks[in] The number of chunks of memory in the batch.

  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.

  • format_opts[in] The Cascaded compression options and datatype to use.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during compression.

  • max_total_uncompressed_bytes[in] Upper bound on the total uncompressed size of all chunks

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressGetMaxOutputChunkSize(
size_t max_uncompressed_chunk_bytes,
nvcompBatchedCascadedOpts_t format_opts,
size_t *max_compressed_chunk_bytes,
)#

Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedCascadedCompressAsync() for each chunk.

Parameters:
  • max_uncompressed_chunk_bytes[in] The maximum size of a chunk before compression.

  • format_opts[in] The Cascaded compression options to use.

  • max_compressed_chunk_bytes[out] The maximum possible compressed size of the chunk.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedCompressAsync(
const void *const *device_uncompressed_chunk_ptrs,
const size_t *device_uncompressed_chunk_bytes,
size_t max_uncompressed_chunk_bytes,
size_t num_chunks,
void *device_temp_ptr,
size_t temp_bytes,
void *const *device_compressed_chunk_ptrs,
size_t *device_compressed_chunk_bytes,
nvcompBatchedCascadedOpts_t format_opts,
cudaStream_t stream,
)#

Perform batched asynchronous compression.

Note

The current implementation does not support uncompressed size larger than 4,294,967,295 bytes (max uint32_t).

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_uncompressed_chunk_ptrs[in] Array with size num_chunks of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedCompressGetRequiredAlignments` when called with the same format_opts.

  • device_uncompressed_chunk_bytes[in] Array with size num_chunks of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size must be a multiple of the size of the data type specified by format_opts.type, else this may crash or produce invalid output.

  • max_uncompressed_chunk_bytes[in] The size of the largest uncompressed chunk. This parameter is currently unused. Set it to either the actual value or zero.

  • num_chunks[in] Number of chunks of data to compress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_compressed_chunk_ptrs[out] Array with size num_chunks of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedCascadedCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedCompressGetRequiredAlignments` when called with the same format_opts.

  • device_compressed_chunk_bytes[out] Array with size num_chunks, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.

  • format_opts[in] The cascaded format options. The format must be valid.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedDecompressGetTempSize(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedDecompressGetTempSizeEx(
size_t num_chunks,
size_t max_uncompressed_chunk_bytes,
size_t *temp_bytes,
size_t max_total_uncompressed_bytes,
)#

Get the amount of temporary memory required on the GPU for decompression with extra total bytes argument.

Parameters:
  • num_chunks[in] Number of chunks of data to be decompressed.

  • max_uncompressed_chunk_bytes[in] The size of the largest chunk in bytes when uncompressed.

  • temp_bytes[out] The amount of GPU memory that will be temporarily required during decompression.

  • max_total_uncompressed_bytes[in] The total decompressed size of all the chunks. Unused in Cascaded.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedGetDecompressSizeAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
cudaStream_t stream,
)#

Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedCascadedDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.

  • num_chunks[in] Number of data chunks to compute sizes of.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successful, and an error code otherwise.

nvcompStatus_t nvcompBatchedCascadedDecompressAsync(
const void *const *device_compressed_chunk_ptrs,
const size_t *device_compressed_chunk_bytes,
const size_t *device_uncompressed_buffer_bytes,
size_t *device_uncompressed_chunk_bytes,
size_t num_chunks,
void *const device_temp_ptr,
size_t temp_bytes,
void *const *device_uncompressed_chunk_ptrs,
nvcompStatus_t *device_statuses,
cudaStream_t stream,
)#

Perform batched asynchronous decompression.

This function is used to decompress compressed buffers produced by `nvcompBatchedCascadedCompressAsync`.

Note

Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.

Parameters:
  • device_compressed_chunk_ptrs[in] Array with size num_chunks of pointers in device-accessible memory to device-accessible compressed buffers. Each buffer must be aligned to the value in `nvcompBatchedCascadedDecompressRequiredAlignments.input`.

  • device_compressed_chunk_bytes[in] Array with size num_chunks of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.

  • device_uncompressed_buffer_bytes[in] Array with size num_chunks of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status in device_statuses corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.

  • device_uncompressed_chunk_bytes[out] Array with size num_chunks to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.

  • num_chunks[in] Number of chunks of data to decompress.

  • device_temp_ptr[in] This argument is not used.

  • temp_bytes[in] This argument is not used.

  • device_uncompressed_chunk_ptrs[out] Array with size num_chunks of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry in device_uncompressed_buffer_bytes, and be aligned to the value in `nvcompBatchedCascadedDecompressRequiredAlignments.output`.

  • device_statuses[out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`.

  • stream[in] The CUDA stream to operate on.

Returns:

nvcompSuccess if successfully launched, and an error code otherwise.

Variables

static const nvcompBatchedCascadedOpts_t nvcompBatchedCascadedDefaultOpts = {4096, NVCOMP_TYPE_INT, 2, 1, 1}#
const size_t nvcompCascadedCompressionMaxAllowedChunkSize = 1 << 24#
const size_t nvcompCascadedRequiredAlignment = 8#

The most restrictive of minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression or decompression functions. In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.

const nvcompAlignmentRequirements_t nvcompBatchedCascadedDecompressRequiredAlignments{4, 8, 1}#

Minimum buffer alignment requirements for decompression.

struct nvcompBatchedCascadedOpts_t#
#include <cascaded.h>

Structure that stores the compression configuration.

Public Members

size_t internal_chunk_bytes#

The size of each internal chunk of data to decompress independently with.

Cascaded compression. The value should be in the range of [512, 16384] depending on the datatype of the input and the shared memory size of the GPU being used. This is not the size of chunks passed into the API. Recommended size is 4096.

Note

Not currently used and a default of 4096 is just used.

nvcompType_t type#

The datatype used to define the bit-width for compression.

int num_RLEs#

The number of Run Length Encodings to perform.

int num_deltas#

The number of Delta Encodings to perform.

int use_bp#

Whether or not to bitpack the final layers.