C API#
This is the C API reference for the NVIDIA® nvCOMP library.
Generic#
Functions
-
nvcompStatus_t nvcompGetProperties(nvcompProperties_t *properties)#
Retrieve the nvCOMP library properties.
- Parameters:
properties – [out] Retrieved nvCOMP properties in an nvcompProperties_t struct.
- Returns:
nvcompErrorInvalidValue if properties is nullptr, nvcompSuccess otherwise.
Enums
-
enum nvcompStatus_t#
nvCOMP return statuses.
Values:
-
enumerator nvcompSuccess#
-
enumerator nvcompErrorInvalidValue#
-
enumerator nvcompErrorNotSupported#
-
enumerator nvcompErrorCannotDecompress#
-
enumerator nvcompErrorBadChecksum#
-
enumerator nvcompErrorCannotVerifyChecksums#
-
enumerator nvcompErrorOutputBufferTooSmall#
-
enumerator nvcompErrorWrongHeaderLength#
-
enumerator nvcompErrorAlignment#
-
enumerator nvcompErrorChunkSizeTooLarge#
-
enumerator nvcompErrorCannotCompress#
-
enumerator nvcompErrorWrongInputLength#
-
enumerator nvcompErrorCudaError#
-
enumerator nvcompErrorInternal#
-
enumerator nvcompSuccess#
-
enum nvcompType_t#
Supported data types.
Values:
-
enumerator NVCOMP_TYPE_CHAR#
-
enumerator NVCOMP_TYPE_UCHAR#
-
enumerator NVCOMP_TYPE_SHORT#
-
enumerator NVCOMP_TYPE_USHORT#
-
enumerator NVCOMP_TYPE_INT#
-
enumerator NVCOMP_TYPE_UINT#
-
enumerator NVCOMP_TYPE_LONGLONG#
-
enumerator NVCOMP_TYPE_ULONGLONG#
-
enumerator NVCOMP_TYPE_FLOAT16#
-
enumerator NVCOMP_TYPE_BITS#
-
enumerator NVCOMP_TYPE_CHAR#
-
enum nvcompDecompressBackend_t#
Available decompression backend options.
Values:
-
enumerator NVCOMP_DECOMPRESS_BACKEND_DEFAULT#
Let nvCOMP decide the best decompression backend internally, either hardware decompression or one of the CUDA implementations.
-
enumerator NVCOMP_DECOMPRESS_BACKEND_HARDWARE#
Decompress using the dedicated hardware decompression engine.
-
enumerator NVCOMP_DECOMPRESS_BACKEND_CUDA#
Decompress using the CUDA implementation.
-
enumerator NVCOMP_DECOMPRESS_BACKEND_DEFAULT#
-
struct nvcompProperties_t#
- #include <shared_types.h>
nvCOMP properties.
-
struct nvcompAlignmentRequirements_t#
- #include <shared_types.h>
Per-algorithm buffer alignment requirements.
Public Members
-
size_t input#
Minimum alignment requirement of each input buffer.
-
size_t output#
Minimum alignment requirement of each output buffer.
-
size_t temp#
Minimum alignment requirement of temporary-storage buffer, if any. For algorithms that do not use temporary storage, this field is always equal to 1.
-
size_t input#
Note
nvcompBatched<compression_method>CompressGetTempSizeEx APIs are provided to allow the user to provide max_total_uncompressed_bytes, otherwise it is assumed that all chunks are of size max_uncompressed_chunk_bytes which can lead to an overestimate in temporary memory requirements.
CRC32#
Enums
-
enum nvcompCRC32KernelKind_t#
Enumeration of kernel kinds for CRC32 computation.
Values:
-
enumerator nvcompCRC32WarpKernel#
Let each warp process its own chunk of input data.
-
enumerator nvcompCRC32BlockKernel#
Let one or more blocks process each chunk of input data.
-
enumerator nvcompCRC32WarpKernel#
-
enum nvcompCRC32SegmentKind_t#
Enumeration specifying segment types for streaming CRC32 computation.
Values:
-
enumerator nvcompCRC32OnlySegment#
Single segment (complete message).
-
enumerator nvcompCRC32FirstSegment#
First segment of a message that may be followed by further segments.
-
enumerator nvcompCRC32MidSegment#
Non-first segment of a message that may be followed by further segments.
-
enumerator nvcompCRC32LastSegment#
Last segment of a message.
If the segment is also the first segment, nvcompCRC32OnlySegment should be used instead.
This enumerator can also be used to retroactively mark the last processed segment as the last segment of a message. For details, see nvcompBatchedCRC32Async.
-
enumerator nvcompCRC32OnlySegment#
Functions
- nvcompStatus_t nvcompBatchedCRC32Async(
- const void *const *device_input_chunk_ptrs,
- const size_t *device_input_chunk_bytes,
- size_t num_chunks,
- uint32_t *device_crc32_ptr,
- nvcompBatchedCRC32Opts_t opts,
- nvcompCRC32SegmentKind_t segment_kind,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform CRC32 checksum calculation asynchronously.
All pointers must point to device-accessible locations.
This function supports streaming CRC32 computation, where the input data might not be visible all at once but only in individual segments. This is controlled by the
segment_kind
parameter. See nvcompCRC32SegmentKind_t for details. If the input data nevertheless is visible all at once, nvcompCRC32OnlySegment should be passed assegment_kind
. If a segment is processed as if it may be followed by further segments, but it subsequently turns out to have been the last segment, the CRC32 calculation can be finalized by passing a null pointer asdevice_input_chunk_ptrs
and nvcompCRC32LastSegment assegment_kind
.Note
The length of a chunk is allowed to be zero. Length-zero chunks may be useful in situations where the number of segments is message-dependent. Rather than having to perform potentially complicated input and output permutations, the missing chunks can be represented as length-zero chunks.
- Parameters:
device_input_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the input data chunks. Both the pointers and the input data should reside in device-accessible memory. The data chunks do not have any alignment requirements.device_input_chunk_bytes – [in] Array with size
num_chunks
of sizes of the input chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks to compute checksums of.
device_crc32_ptr – [out] Array with size
num_chunks
on the GPU to be filled with the CRC32 checksum of each chunk.opts – [in] The CRC32 options.
segment_kind – [in] The nvcompCRC32SegmentKind_t to use.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. For each chunk the status will be set to `nvcompSuccess` if the CRC32 calculation is successful, or an error code otherwise. Can be NULL if desired, in which case error status is not reported.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCRC32GetHeuristicConf(
- const size_t *device_input_chunk_bytes,
- size_t num_chunks,
- nvcompCRC32KernelConf_t *kernel_conf,
- size_t max_input_chunk_bytes,
- cudaStream_t stream,
Heuristically determine a performant kernel configuration for CRC32 computation based on input data characteristics.
This function is particularly useful when all chunks are of a similar size, both within and across nvcompBatchedCRC32Async calls. If, in addition, the number of chunks is the same or similar across nvcompBatchedCRC32Async calls, reusing the configuration obtained from this function for all nvcompBatchedCRC32Async calls should work well.
The result depends on the GPU model, the number of chunks, and the maximum input chunk size. The latter can be passed directly in
max_input_chunk_bytes
or can be deduced fromdevice_input_chunk_bytes
. When directly specifyingmax_input_chunk_bytes
,device_input_chunk_bytes
should be passed as nvcompCRC32IgnoredInputChunkBytes or a null pointer. When deducingmax_input_chunk_bytes
fromdevice_input_chunk_bytes
,max_input_chunk_bytes
should be set to nvcompCRC32DeducedMaxInputChunkBytes or 0.This function is always synchronous with respect to the host. When directly passing the maximum input chunk size in
max_input_chunk_bytes
, no synchronization with the device happens andstream
is ignored. When deducingmax_input_chunk_bytes
fromdevice_input_chunk_bytes
, the function synchronizes withstream
. On devices that do not support stream-ordered memory allocation, the function synchronizes with the entire device in this case.- Parameters:
device_input_chunk_bytes – [in] Array with size
num_chunks
of sizes of the input chunks in bytes, residing in device-accessible memory, or nvcompCRC32IgnoredInputChunkBytes ifmax_input_chunk_bytes
is directly specified. In the former case, the data chunks do not have any alignment requirements.num_chunks – [in] The number of chunks to compute checksums of.
kernel_conf – [out] Pointer to the kernel configuration to be filled.
max_input_chunk_bytes – [in] Maximum input chunk size in bytes, or nvcompCRC32DeducedMaxInputChunkBytes to deduce from
device_input_chunk_bytes
.stream – [in] The CUDA stream to operate on. Ignored if
max_input_chunk_bytes
is directly specified.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCRC32SearchConf(
- const void *const *device_input_chunk_ptrs,
- const size_t *device_input_chunk_bytes,
- size_t num_chunks,
- uint32_t *device_crc32_ptr,
- nvcompCRC32Spec_t spec,
- nvcompCRC32KernelConf_t *kernel_conf,
- cudaStream_t stream,
Explicitly search for the optimal CRC32 kernel configuration by benchmarking.
In most cases, nvcompBatchedCRC32GetHeuristicConf should provide a sufficiently performant kernel configuration using much less time and fewer resources. When performance is of paramount importance, this function can be used to explicitly search for the optimal kernel configuration. Note that this only makes sense when processing a large number of batches and the number and length of chunks are very similar across batches so that the same kernel configuration can be used.
This function is always synchronous with respect to the host and synchronizes with
stream
. On devices that do not support stream-ordered memory allocation, the function synchronizes with the entire device.- Parameters:
device_input_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the input data chunks in device-accessible memory. The data chunks do not have any alignment requirements.device_input_chunk_bytes – [in] Array with size
num_chunks
of sizes of the input chunks in bytes, residing in device-accessible memory.num_chunks – [in] The number of chunks to use for benchmarking.
device_crc32_ptr – [out] Array with size
num_chunks
on the GPU to be used for benchmark outputs.spec – [in] The CRC32 specification to use for benchmarking.
kernel_conf – [out] Pointer to the kernel configuration to be filled with optimal settings.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
Variables
-
static const nvcompCRC32Spec_t nvcompCRC32 = {0x04C11DB7, 0xFFFFFFFF, true, true, 0xFFFFFFFF, {0}}#
Standard CRC32 (aka CRC-32/PKZIP) model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_C = {0x1EDC6F41, 0xFFFFFFFF, true, true, 0xFFFFFFFF, {0}}#
CRC32-C (aka CRC-32/ISCSI) model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_D = {0xA833982B, 0xFFFFFFFF, true, true, 0xFFFFFFFF, {0}}#
CRC32-D (aka CRC-32/BASE91-D) model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_Q = {0x814141AB, 0x00000000, false, false, 0x00000000, {0}}#
CRC32-Q (aka CRC-32/AIXM) model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_MEF = {0x741B8CD7, 0xFFFFFFFF, true, true, 0x00000000, {0}}#
CRC-32/MEF model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_XFER = {0x000000AF, 0x00000000, false, false, 0x00000000, {0}}#
CRC-32/XFER model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_BZIP2 = {0x04C11DB7, 0xFFFFFFFF, false, false, 0xFFFFFFFF, {0}}#
CRC-32/BZIP2 (aka CRC-32/AAL-5) model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_POSIX = {0x04C11DB7, 0x00000000, false, false, 0xFFFFFFFF, {0}}#
CRC-32/POSIX (aka CRC-32/CKSUM) model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_JAMCRC = {0x04C11DB7, 0xFFFFFFFF, true, true, 0x00000000, {0}}#
CRC-32/JAMCRC model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_MPEG_2 = {0x04C11DB7, 0xFFFFFFFF, false, false, 0x00000000, {0}}#
CRC-32/MPEG-2 model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_AUTOSAR = {0xF4ACFB13, 0xFFFFFFFF, true, true, 0xFFFFFFFF, {0}}#
CRC-32/AUTOSAR model preset.
-
static const nvcompCRC32Spec_t nvcompCRC32_CD_ROM_EDC = {0x8001801B, 0x00000000, true, true, 0x00000000, {0}}#
CRC-32/CD-ROM-EDC model preset.
-
static const size_t *const nvcompCRC32IgnoredInputChunkBytes = NULL#
Value to pass as
device_input_chunk_bytes
to nvcompBatchedCRC32GetHeuristicConf when specifying the maximum input chunk size inmax_input_chunk_bytes
.Equal to a null pointer.
-
static const size_t nvcompCRC32DeducedMaxInputChunkBytes = 0#
Value to pass as
max_input_chunk_bytes
to nvcompBatchedCRC32GetHeuristicConf to indicate that max input chunk bytes should be deduced fromdevice_input_chunk_bytes
.Equal to 0.
-
struct nvcompCRC32Spec_t#
- #include <crc32.h>
CRC32 model specification.
Public Members
-
uint32_t poly#
Polynomial used for CRC calculation.
-
uint32_t init#
Initial value for CRC shift register.
-
bool ref_in#
Flag indicating whether input bytes should be reflected.
-
bool ref_out#
Flag indicating whether the final CRC value should be reflected.
The reflection is done before XOR-ing with xorout.
-
uint32_t xorout#
Value with which to to XOR the final CRC result.
If ref_in is true, the XOR operation is applied after the final CRC value is reflected.
-
char reserved[16]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
uint32_t poly#
-
struct nvcompCRC32KernelConf_t#
- #include <crc32.h>
Configuration for CRC32 kernel execution.
Public Members
-
nvcompCRC32KernelKind_t kernel_kind#
Type of kernel to use for CRC32 computation.
-
int32_t bytes_per_read#
Number of bytes each thread read in each processing step.
-
int32_t blocks_per_msg#
Number of thread blocks to use per message.
Only relevant if kernel_kind is nvcompCRC32BlockKernel. Ignored if kernel_kind is nvcompCRC32WarpKernel.
-
char reserved[20]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompCRC32KernelKind_t kernel_kind#
-
struct nvcompBatchedCRC32Opts_t#
- #include <crc32.h>
Options for batched CRC32 computation.
Public Members
-
nvcompCRC32Spec_t spec#
The CRC32 specification to use.
-
nvcompCRC32KernelConf_t kernel_conf#
The kernel configuration to use.
-
char reserved[64]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompCRC32Spec_t spec#
LZ4#
Functions
- nvcompStatus_t nvcompBatchedLZ4CompressGetRequiredAlignments(
- nvcompBatchedLZ4CompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4CompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedLZ4CompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4CompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedLZ4CompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4CompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedLZ4CompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedLZ4CompressAsync for each chunk.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] The LZ4 compression options to use.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4CompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedLZ4CompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Note
For best performance, a chunk size of 65536 bytes is recommended.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size must be a multiple of the size of the data type specified by compress_opts.data_type. Chunk sizes must not exceed 16777216 bytes. For best performance, a chunk size of 65536 bytes is recommended.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the same
compress_opts
.temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedLZ4CompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4CompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] The LZ4 compression options to use.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4DecompressGetRequiredAlignments(
- nvcompBatchedLZ4DecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4DecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedLZ4DecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4DecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedLZ4DecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4DecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4GetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
This is needed when we do not know the expected output size.
Warning
If the stream is corrupt, the calculated sizes will be invalid.
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4DecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk. This argument needs to be preallocated in device-accessible memory.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedLZ4DecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedLZ4DecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt compressed buffer for decompression on the hardware decompress engine will result in undefined behavior.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4DecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated. When `NVCOMP_DECOMPRESS_BACKEND_HARDWARE` is specified indecompress_opts.backend
, this parameter is required. For `NVCOMP_DECOMPRESS_BACKEND_CUDA`, it is optional and may be set to NULL if reporting the actual sizes is not necessary.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4DecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedLZ4DecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If decompression using the CUDA backend is not successful, for example due to the corrupted input or out-of-bound errors, the `device_statuses` will be set to `nvcompErrorCannotDecompress`. If using the hardware backend, any corrupted input leads to undefined behavior and the `device_statuses` are always set to `nvcompSuccess`. Can be NULL if desired, in which case error status is not reported.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedLZ4CompressOpts_t nvcompBatchedLZ4CompressDefaultOpts = {NVCOMP_TYPE_CHAR, {0}}#
Default LZ4 compression options.
-
static const nvcompBatchedLZ4DecompressOpts_t nvcompBatchedLZ4DecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, 0, {0}}#
Default LZ4 decompression options.
-
static const size_t nvcompLZ4CompressionMaxAllowedChunkSize = 1 << 24#
The maximum supported uncompressed chunk size in bytes for the LZ4 compressor.
-
static const size_t nvcompLZ4RequiredCompressionAlignment = 4#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompLZ4RequiredDecompressionAlignment = 1#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedLZ4CompressOpts_t#
- #include <lz4.h>
LZ4 compression options for the low-level API.
Public Members
-
nvcompType_t data_type#
LZ4 data type to use.
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompType_t data_type#
-
struct nvcompBatchedLZ4DecompressOpts_t#
- #include <lz4.h>
LZ4 decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
int sort_before_hw_decompress#
Whether to sort chunks before hardware decompression for better load balancing. Only used when the backend is the hardware decompression engine.
-
char reserved[56]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
Snappy#
Functions
- nvcompStatus_t nvcompBatchedSnappyCompressGetRequiredAlignments(
- nvcompBatchedSnappyCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedSnappyCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedSnappyCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedSnappyCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedSnappyCompressAsync for each chunk.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] Snappy compression options.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedSnappyCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk. This parameter is currently unused. Set it to either the actual value or zero.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace, could be NULL in case temporary memory is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the same
compress_opts
.temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedSnappyCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] Snappy compression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyDecompressGetRequiredAlignments(
- nvcompBatchedSnappyDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedSnappyDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedSnappyDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk. This argument needs to be preallocated in device-accessible memory.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedSnappyDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedSnappyDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt buffer for decompression will result in undefined behavior irrespective of the decompression backend used.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated. When `NVCOMP_DECOMPRESS_BACKEND_HARDWARE` is specified indecompress_opts.backend
, this parameter is required. For `NVCOMP_DECOMPRESS_BACKEND_CUDA`, it is optional and may be set to NULL if reporting the actual sizes is not necessary.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space, could be NULL in case temporary space is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyDecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedSnappyDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance. Can be NULL if desired, in which case error status is not reported.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedSnappyCompressOpts_t nvcompBatchedSnappyCompressDefaultOpts = {{0}}#
Default Snappy compression options.
-
static const nvcompBatchedSnappyDecompressOpts_t nvcompBatchedSnappyDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, 0, {0}}#
Default Snappy decompression options.
-
static const size_t nvcompSnappyCompressionMaxAllowedChunkSize = 1 << 24#
The maximum supported uncompressed chunk size in bytes for the Snappy compressor.
-
static const size_t nvcompSnappyDecompressionMaxAllowedChunkSize = (1ull << 31) - 1#
The maximum supported compressed chunk size in bytes for the Snappy decompressor.
-
static const size_t nvcompSnappyRequiredCompressionAlignment = 1#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompSnappyRequiredDecompressionAlignment = 1#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedSnappyCompressOpts_t#
- #include <snappy.h>
Snappy compression options for the low-level API.
Public Members
-
char reserved[64]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
char reserved[64]#
-
struct nvcompBatchedSnappyDecompressOpts_t#
- #include <snappy.h>
Snappy decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
int sort_before_hw_decompress#
Whether to sort chunks before hardware decompression for better load balancing. Only used when the backend is the hardware decompression engine.
-
char reserved[56]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
Deflate#
Functions
- nvcompStatus_t nvcompBatchedDeflateCompressGetRequiredAlignments(
- nvcompBatchedDeflateCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedDeflateCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedDeflateCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedDeflateCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedDeflateCompressAsync for each chunk.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] The Deflate compression options to use.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedDeflateCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Note
For best performance, a chunk size of 65536 bytes is recommended.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Chunk sizes must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the same
compress_opts
.temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedDeflateCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] The Deflate compression options to use.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateDecompressGetRequiredAlignments(
- nvcompBatchedDeflateDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedDeflateDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedDeflateDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
This is needed when we do not know the expected output size.
Warning
If the stream is corrupt, the calculated sizes will be invalid.
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedDeflateDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedDeflateDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt buffer for decompression will result in undefined behavior irrespective of the decompression backend used.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated. When `NVCOMP_DECOMPRESS_BACKEND_HARDWARE` is specified indecompress_opts.backend
, this parameter is required. For `NVCOMP_DECOMPRESS_BACKEND_CUDA`, it is optional and may be set to NULL if reporting the actual sizes is not necessary.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateDecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedDeflateDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance. Can be NULL if desired, in which case error status is not reported.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedDeflateCompressOpts_t nvcompBatchedDeflateCompressDefaultOpts = {1, {0}}#
Default Deflate compression options.
-
static const nvcompBatchedDeflateDecompressOpts_t nvcompBatchedDeflateDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, 0, {0}}#
Default Deflate decompression options.
-
static const size_t nvcompDeflateCompressionMaxAllowedChunkSize = 1u << 31#
The maximum supported uncompressed chunk size in bytes for the Deflate compressor.
Note
Although chunk sizes up to 2GB are theoretically possible, compression with large chunks may be very slow or use large amounts of temporary memory, so caution is advised when using chunk sizes above 64KB.
-
static const size_t nvcompDeflateRequiredCompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompDeflateRequiredDecompressionAlignment = 4#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedDeflateCompressOpts_t#
- #include <deflate.h>
Deflate compression options for the low-level API.
Public Members
-
int algorithm#
Deflate algorithm options.
0: highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)
1: high-throughput, low compression ratio (default)
2: medium-througput, medium compression ratio, beat Zlib level 1 on the compression ratio
3: placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point
4: lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio
5: lowest-throughput, highest compression ratio
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
int algorithm#
-
struct nvcompBatchedDeflateDecompressOpts_t#
- #include <deflate.h>
Deflate decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
int sort_before_hw_decompress#
Whether to sort chunks before hardware decompression for better load balancing. Only used when the backend is the hardware decompression engine.
-
char reserved[56]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
GDeflate#
Functions
- nvcompStatus_t nvcompBatchedGdeflateCompressGetRequiredAlignments(
- nvcompBatchedGdeflateCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGdeflateCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGdeflateCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGdeflateCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedGdeflateCompressAsync for each chunk.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] The GDeflate compression options to use.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedGdeflateCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Note
For best performance, a chunk size of 65536 bytes is recommended.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Chunk sizes must not exceed 65536 bytes. For best performance, a chunk size of 65536 bytes is recommended.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the same
compress_opts
.temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedGdeflateCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] The GDeflate compression options to use.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateDecompressGetRequiredAlignments(
- nvcompBatchedGdeflateDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGdeflateDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedGdeflateDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
This is needed when we do not know the expected output size.
Warning
If the stream is corrupt, the calculated sizes will be invalid.
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGdeflateDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedGdeflateDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
In the case where a chunk of compressed data is not a valid GDeflate stream, the calculated sizes of the uncompressed chunk will be invalid and nvcompStatusCannotDecompress will be flagged for that chunk.
Providing a corrupt buffer for decompression will result in undefined behavior.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated, but can be NULL if desired, in which case the actual sizes are not reported.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateDecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGdeflateDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance. Can be NULL if desired, in which case error status is not reported.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedGdeflateCompressOpts_t nvcompBatchedGdeflateCompressDefaultOpts = {1, {0}}#
Default Gdeflate compression options.
-
static const nvcompBatchedGdeflateDecompressOpts_t nvcompBatchedGdeflateDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, {0}}#
Default Gdeflate decompression options.
-
static const size_t nvcompGdeflateCompressionMaxAllowedChunkSize = 1u << 31#
The maximum supported uncompressed chunk size in bytes for the Gdeflate compressor.
Note
Although chunk sizes up to 2GB are theoretically possible, compression with large chunks may be very slow or use large amounts of temporary memory, so caution is advised when using chunk sizes above 64KB.
-
static const size_t nvcompGdeflateRequiredCompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompGdeflateRequiredDecompressionAlignment = 4#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedGdeflateCompressOpts_t#
- #include <gdeflate.h>
Gdeflate compression options for the low-level API.
Public Members
-
int algorithm#
Gdeflate algorithm options.
0: highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)
1: high-throughput, low compression ratio (default)
2: medium-througput, medium compression ratio, beat Zlib level 1 on the compression ratio
3: placeholder for further compression level support, will fall into MEDIUM_COMPRESSION at this point
4: lower-throughput, higher compression ratio, beat Zlib level 6 on the compression ratio
5: lowest-throughput, highest compression ratio
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
int algorithm#
-
struct nvcompBatchedGdeflateDecompressOpts_t#
- #include <gdeflate.h>
Gdeflate decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
ZSTD#
Functions
- nvcompStatus_t nvcompBatchedZstdCompressGetRequiredAlignments(
- nvcompBatchedZstdCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedZstdCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedZstdCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedZstdCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedZstdCompressAsync for each chunk.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] The Zstd compression options to use. Currently empty.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedZstdCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Note
For best performance, a chunk size of 65536 bytes is recommended.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Chunk sizes must not exceed 16 MB. For best performance, a chunk size of 64 KB is recommended.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace, could be NULL in case temporary memory is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the same
compress_opts
.temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedZstdCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] The Zstd compression options to use. Currently empty.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdDecompressGetRequiredAlignments(
- nvcompBatchedZstdDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedZstdDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedZstdDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk. This argument needs to be preallocated in device-accessible memory.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedZstdDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedZstdDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt buffer for decompression will result in undefined behavior.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space, could be NULL in case temporary space is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdDecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedZstdDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedZstdCompressOpts_t nvcompBatchedZstdCompressDefaultOpts = {{0}}#
Default Zstd compression options.
-
static const nvcompBatchedZstdDecompressOpts_t nvcompBatchedZstdDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, {0}}#
Default Zstd decompression options.
-
static const size_t nvcompZstdCompressionMaxAllowedChunkSize = (1UL << 31) - 1#
The maximum supported uncompressed chunk size in bytes for the Zstd compressor.
-
static const size_t nvcompZstdRequiredCompressionAlignment = 4#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompZstdRequiredDecompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedZstdCompressOpts_t#
- #include <zstd.h>
Zstd compression options for the low-level API.
Public Members
-
char reserved[64]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
char reserved[64]#
-
struct nvcompBatchedZstdDecompressOpts_t#
- #include <zstd.h>
Zstd decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
GZIP#
Enums
Functions
- nvcompStatus_t nvcompBatchedGzipCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGzipCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGzipCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGzipCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGzipCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedGzipCompressAsync for each chunk.
Note
For best performance, a chunk size of 65536 bytes is recommended.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] The Gzip compression options to use.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedGzipCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Note
For best performance, a chunk size of 65536 bytes is recommended.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace.
temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedGzipCompressGetMaxOutputChunkSize`.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] The Gzip compression options to use.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipDecompressGetRequiredAlignments(
- nvcompBatchedGzipDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedGzipDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedGzipDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGzipDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
This is needed when we do not know the expected output size.
Warning
If the stream is corrupt, the calculated sizes will be invalid.
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGzipDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedGzipDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedGzipDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt buffer for decompression will result in undefined behavior irrespective of the decompression backend used.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGzipDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated. When `NVCOMP_DECOMPRESS_BACKEND_HARDWARE` is specified indecompress_opts.backend
, this parameter is required. For `NVCOMP_DECOMPRESS_BACKEND_CUDA`, it is optional and may be set to NULL if reporting the actual sizes is not necessary.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGzipDecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedGzipDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance. Can be NULL if desired, in which case error status is not reported.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedGzipCompressOpts_t nvcompBatchedGzipCompressDefaultOpts = {{0}}#
Default Gzip compression options.
-
static const nvcompBatchedGzipDecompressOpts_t nvcompBatchedGzipDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, NVCOMP_GZIP_DECOMPRESS_ALGORITHM_NAIVE, 0, {0}}#
Default Gzip decompression options.
-
static const size_t nvcompGzipRequiredDecompressionAlignment = 1#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedGzipCompressOpts_t#
- #include <gzip.h>
Gzip compression options for the low-level API.
Public Members
-
char reserved[64]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
char reserved[64]#
-
struct nvcompBatchedGzipDecompressOpts_t#
- #include <gzip.h>
Gzip decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
nvcompBatchedGzipDecompressAlgorithm_t algorithm#
Decompression CUDA algorithm to use.
-
int sort_before_hw_decompress#
Whether to sort chunks before hardware decompression for better load balancing. Only used when the backend is the hardware decompression engine.
-
char reserved[52]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
ANS#
Functions
- nvcompStatus_t nvcompBatchedANSCompressGetRequiredAlignments(
- nvcompBatchedANSCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedANSCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedANSCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedANSCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedANSCompressAsync() for each chunk.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] Compression options.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedANSCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] The temporary GPU workspace, could be NULL in case temporary memory is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the same
compress_opts
.temp_bytes – [in] The size of the temporary GPU memory pointed to by `device_temp_ptr`.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedANSCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] Compression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSDecompressGetRequiredAlignments(
- nvcompBatchedANSDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedANSDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedANSDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedANSDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedANSDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
This function is used to decompress compressed buffers produced by nvcompBatchedANSCompressAsync .
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] The temporary GPU space, could be NULL in case temporary space is not needed. Must be aligned to the value in the `temp` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSDecompressGetRequiredAlignments`.
temp_bytes – [in] The size of the temporary GPU space.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedANSDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. If the decompression is not successful, for example due to the corrupted input or out-of-bound errors, the status will be set to `nvcompErrorCannotDecompress`.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedANSCompressOpts_t nvcompBatchedANSCompressDefaultOpts = {nvcomp_rANS, NVCOMP_TYPE_CHAR, {0}}#
Default ANS compression options.
-
static const nvcompBatchedANSDecompressOpts_t nvcompBatchedANSDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, {0}}#
Default ANS decompression options.
-
static const size_t nvcompANSCompressionMaxAllowedChunkSize = 1 << 24#
The maximum supported uncompressed chunk size in bytes for the ANS compressor.
-
static const size_t nvcompANSRequiredCompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompANSRequiredDecompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedANSCompressOpts_t#
- #include <ans.h>
ANS compression options for the low-level API.
Public Members
-
nvcompANSType_t type#
ANS algorithm to use.
-
nvcompType_t data_type#
ANS data type to use.
NVCOMP_TYPE_(U)CHAR: 1-byte, generic data type
NVCOMP_TYPE_FLOAT16: 2-byte floating-point data type. Applicable to all half-precision data formats.
-
char reserved[56]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompANSType_t type#
-
struct nvcompBatchedANSDecompressOpts_t#
- #include <ans.h>
ANS decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
Bitcomp#
Functions
- nvcompStatus_t nvcompBatchedBitcompCompressGetRequiredAlignments(
- nvcompBatchedBitcompCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedBitcompCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedBitcompCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedBitcompCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedBitcompCompressAsync for each chunk.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] Compression options.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedBitcompCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size must be a multiple of the size of the data type specified by compress_opts.data_type.max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch. This parameter is currently unused. Set it to either the actual value or zero.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] This argument is not used.
temp_bytes – [in] This argument is not used.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedBitcompCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] Compression options. They must be valid.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompDecompressGetRequiredAlignments(
- nvcompBatchedBitcompDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedBitcompDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedBitcompDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks. Unused in Bitcomp.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] This argument is not used.
device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedBitcompDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedBitcompDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
This function is used to decompress compressed buffers produced by nvcompBatchedBitcompCompressAsync . It can also decompress buffers compressed with the native Bitcomp API.
Note
The function is not completely asynchronous, as it needs to look at the compressed data in order to create the proper bitcomp handle. The stream is synchronized, the data is examined, then the asynchronous decompression is launched.
An asynchronous, faster version of batched Bitcomp asynchrnous decompression is available, and can be launched via the HLIF manager.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt buffer for decompression will result in undefined behavior.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] This argument is not used.
device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] Temporary scratch memory.
temp_bytes – [in] Size of temporary scratch memory.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedBitcompDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedBitcompCompressOpts_t nvcompBatchedBitcompCompressDefaultOpts = {0, NVCOMP_TYPE_UCHAR, {0}}#
Default Bitcomp compression options.
-
static const nvcompBatchedBitcompDecompressOpts_t nvcompBatchedBitcompDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, {0}}#
Default Bitcomp decompression options.
-
static const size_t nvcompBitcompCompressionMaxAllowedChunkSize = 1 << 24#
The maximum supported uncompressed chunk size in bytes for the Bitcomp compressor.
-
static const size_t nvcompBitcompRequiredCompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompBitcompRequiredDecompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedBitcompCompressOpts_t#
- #include <bitcomp.h>
Bitcomp compression options for the low-level API.
Public Members
-
int algorithm#
Bitcomp algorithm options.
0 : Default algorithm, usually gives the best compression ratios
1 : “Sparse” algorithm, works well on sparse data (with lots of zeroes) and is usually faster than the default algorithm.
-
nvcompType_t data_type#
One of nvcomp’s possible data types.
-
char reserved[56]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
int algorithm#
-
struct nvcompBatchedBitcompDecompressOpts_t#
- #include <bitcomp.h>
Bitcomp decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#
Cascaded#
Functions
- nvcompStatus_t nvcompBatchedCascadedCompressGetRequiredAlignments(
- nvcompBatchedCascadedCompressOpts_t compress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for compression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
compress_opts – [in] Compression options.
alignment_requirements – [out] The minimum buffer alignment requirements for compression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedCompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedCascadedCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for compression asynchronously.
- Parameters:
num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedCompressGetTempSizeSync(
- const void *const *const device_uncompressed_chunk_ptrs,
- const size_t *const device_uncompressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedCascadedCompressOpts_t compress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for compression. synchronously.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] The number of chunks of memory in the batch.
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk in the batch.
compress_opts – [in] Compression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during compression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] Upper bound on the total uncompressed size of all chunks
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedCompressGetMaxOutputChunkSize(
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedCascadedCompressOpts_t compress_opts,
- size_t *max_compressed_chunk_bytes,
Get the maximum size that a chunk of size at most max_uncompressed_chunk_bytes could compress to. That is, the minimum amount of output memory required to be given nvcompBatchedCascadedCompressAsync for each chunk.
- Parameters:
max_uncompressed_chunk_bytes – [in] The maximum size of a chunk before compression.
compress_opts – [in] The Cascaded compression options to use.
max_compressed_chunk_bytes – [out] The maximum possible compressed size of the chunk.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedCompressAsync(
- const void *const *device_uncompressed_chunk_ptrs,
- const size_t *device_uncompressed_chunk_bytes,
- size_t max_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *device_temp_ptr,
- size_t temp_bytes,
- void *const *device_compressed_chunk_ptrs,
- size_t *device_compressed_chunk_bytes,
- nvcompBatchedCascadedCompressOpts_t compress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous compression.
Note
The current implementation does not support uncompressed size larger than 4,294,967,295 bytes (max uint32_t).
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_uncompressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers to the uncompressed data chunks. Both the pointers and the uncompressed data should reside in device-accessible memory. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedCompressGetRequiredAlignments` when called with the samecompress_opts
.device_uncompressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the uncompressed chunks in bytes. The sizes should reside in device-accessible memory. Each chunk size must be a multiple of the size of the data type specified by compress_opts.type, else this may crash or produce invalid output.max_uncompressed_chunk_bytes – [in] The size of the largest uncompressed chunk. This parameter is currently unused. Set it to either the actual value or zero.
num_chunks – [in] Number of chunks of data to compress.
device_temp_ptr – [in] This argument is not used.
temp_bytes – [in] This argument is not used.
device_compressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers to the output compressed buffers. Both the pointers and the compressed buffers should reside in device-accessible memory. Each compressed buffer should be preallocated with the size given by `nvcompBatchedCascadedCompressGetMaxOutputChunkSize`. Each compressed buffer must be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedCompressGetRequiredAlignments` when called with the samecompress_opts
.device_compressed_chunk_bytes – [out] Array with size
num_chunks
, to be filled with the compressed sizes of each chunk. The buffer should be preallocated in device-accessible memory.compress_opts – [in] The cascaded format options. The format must be valid.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedDecompressGetRequiredAlignments(
- nvcompBatchedCascadedDecompressOpts_t decompress_opts,
- nvcompAlignmentRequirements_t *alignment_requirements,
Get the minimum buffer alignment requirements for decompression.
Note
Providing buffers with alignments above the minimum requirements (e.g., 16- or 32-byte alignment) may help improve performance.
- Parameters:
decompress_opts – [in] Decompression options.
alignment_requirements – [out] The minimum buffer alignment requirements for decompression.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedDecompressGetTempSizeAsync(
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- nvcompBatchedCascadedDecompressOpts_t decompress_opts,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
Get the amount of temporary memory required on the GPU for decompression asynchronously.
- Parameters:
num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
decompress_opts – [in] Decompression options.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedDecompressGetTempSizeSync(
- const void *const *const device_compressed_chunk_ptrs,
- const size_t *const device_compressed_chunk_bytes,
- size_t num_chunks,
- size_t max_uncompressed_chunk_bytes,
- size_t *temp_bytes,
- size_t max_total_uncompressed_bytes,
- nvcompBatchedCascadedDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Get the amount of temporary memory required on the GPU for decompression synchronously.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.num_chunks – [in] Number of chunks of data to be decompressed.
max_uncompressed_chunk_bytes – [in] The size of the largest chunk in bytes when uncompressed.
temp_bytes – [out] The amount of GPU memory that will be temporarily required during decompression. The value is returned on the host side.
max_total_uncompressed_bytes – [in] The total decompressed size of all the chunks.
decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the data can be parsed successfully, the status will be set to `nvcompSuccess`, and an error code otherwise.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedGetDecompressSizeAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- cudaStream_t stream,
Asynchronously compute the number of bytes of uncompressed data for each compressed chunk.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the sizes, in bytes, of each uncompressed data chunk. If there is an error when retrieving the size of a chunk, the uncompressed size of that chunk will be set to 0. This argument needs to be preallocated in device-accessible memory.num_chunks – [in] Number of data chunks to compute sizes of.
stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successful, and an error code otherwise.
- nvcompStatus_t nvcompBatchedCascadedDecompressAsync(
- const void *const *device_compressed_chunk_ptrs,
- const size_t *device_compressed_chunk_bytes,
- const size_t *device_uncompressed_buffer_bytes,
- size_t *device_uncompressed_chunk_bytes,
- size_t num_chunks,
- void *const device_temp_ptr,
- size_t temp_bytes,
- void *const *device_uncompressed_chunk_ptrs,
- nvcompBatchedCascadedDecompressOpts_t decompress_opts,
- nvcompStatus_t *device_statuses,
- cudaStream_t stream,
Perform batched asynchronous decompression.
This function is used to decompress compressed buffers produced by nvcompBatchedCascadedCompressAsync.
Warning
Violating any of the conditions listed in the parameter descriptions below may result in undefined behaviour.
Providing a corrupt buffer for decompression will result in undefined behavior.
- Parameters:
device_compressed_chunk_ptrs – [in] Array with size
num_chunks
of pointers in device-accessible memory to device-accessible compressed buffers. Each chunk must be aligned to the value in the `input` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedDecompressGetRequiredAlignments`.device_compressed_chunk_bytes – [in] Array with size
num_chunks
of sizes of the compressed buffers in bytes. The sizes should reside in device-accessible memory.device_uncompressed_buffer_bytes – [in] Array with size
num_chunks
of sizes, in bytes, of the output buffers to be filled with uncompressed data for each chunk. The sizes should reside in device-accessible memory. If a size is not large enough to hold all decompressed data, the decompressor will set the status indevice_statuses
corresponding to the overflow chunk to `nvcompErrorCannotDecompress`.device_uncompressed_chunk_bytes – [out] Array with size
num_chunks
to be filled with the actual number of bytes decompressed for every chunk. This argument needs to be preallocated.num_chunks – [in] Number of chunks of data to decompress.
device_temp_ptr – [in] This argument is not used.
temp_bytes – [in] This argument is not used.
device_uncompressed_chunk_ptrs – [out] Array with size
num_chunks
of pointers in device-accessible memory to decompressed data. Each uncompressed buffer needs to be preallocated in device-accessible memory, have the size specified by the corresponding entry indevice_uncompressed_buffer_bytes
, and be aligned to the value in the `output` member of the nvcompAlignmentRequirements_t object output by `nvcompBatchedCascadedDecompressGetRequiredAlignments`.decompress_opts – [in] Decompression options.
device_statuses – [out] Array with size
num_chunks
of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the decompression is successful, the status will be set to `nvcompSuccess`. Passing corrupt, invalid, or insufficient data leads to undefined behavior or out-of-bound errors. Error reporting cannot be guaranteed in this scenario as only a limited validation is performed to maintain performance.stream – [in] The CUDA stream to operate on.
- Returns:
nvcompSuccess if successfully launched, and an error code otherwise.
Variables
-
static const nvcompBatchedCascadedCompressOpts_t nvcompBatchedCascadedCompressDefaultOpts = {4096, NVCOMP_TYPE_INT, 2, 1, 1, {0}}#
Default Cascaded compression options.
-
static const nvcompBatchedCascadedDecompressOpts_t nvcompBatchedCascadedDecompressDefaultOpts = {NVCOMP_DECOMPRESS_BACKEND_DEFAULT, {0}}#
Default Cascaded decompression options.
-
static const size_t nvcompCascadedCompressionMaxAllowedChunkSize = 1 << 24#
The maximum supported uncompressed chunk size in bytes for the Cascaded compressor.
-
static const size_t nvcompCascadedRequiredCompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to compression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
static const size_t nvcompCascadedRequiredDecompressionAlignment = 8#
The most restrictive of the minimum alignment requirements for void-type CUDA memory buffers used for input, output, or temporary memory, passed to decompression functions.
Note
In all cases, typed memory buffers must still be aligned to their type’s size, e.g., 4 bytes for `int`.
-
struct nvcompBatchedCascadedCompressOpts_t#
- #include <cascaded.h>
Cascaded compression options for the low-level API.
Public Members
-
size_t internal_chunk_bytes#
The size of each internal chunk of data to decompress independently with.
Cascaded compression. The value should be in the range of [512, 16384] depending on the datatype of the input and the shared memory size of the GPU being used. This is not the size of chunks passed into the API. Recommended size is 4096.
Note
Not currently used and a default of 4096 is just used.
-
nvcompType_t type#
The datatype used to define the bit-width for compression.
-
int num_RLEs#
The number of Run Length Encodings to perform.
-
int num_deltas#
The number of Delta Encodings to perform.
-
int use_bp#
Whether or not to bitpack the final layers.
-
char reserved[40]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
size_t internal_chunk_bytes#
-
struct nvcompBatchedCascadedDecompressOpts_t#
- #include <cascaded.h>
Cascaded decompression options for the low-level API.
Public Members
-
nvcompDecompressBackend_t backend#
Decompression backend to use.
-
char reserved[60]#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
-
nvcompDecompressBackend_t backend#