High-level C++ Quick Start Guide#

nvCOMP provides a C++ interface, which simplifies use of the library by throwing exceptions and managing state and temporary memory allocation inside of nvcompManager objects.

The high-level interface provides the following features:
  • Compression settings are stored in the nvcompManager object

  • Users can decompress nvCOMP-compressed buffers without knowing how the buffer was compressed

  • The nvcompManager can automatically split a single, uncompressed, contiguous buffer into chunks to allow the algorithms to exploit available parallelism

  • Users can opt to store and verify checksums for the uncompressed and compressed buffers

To use nvCOMP’s C++ interface, you will need to include nvcomp.hpp and the headers of the specific compressors you will be using. For example, for the LZ4 compression scheme used in high_level_quickstart_example.cpp, we need to include

#include "nvcomp/lz4.hpp"
#include "nvcomp.hpp"

All nvCOMP APIs are declared within the nvcomp namespace. For ease of use, we suggest to specify the following in the appropriate scope:

using namespace nvcomp;

Below we introduce the interface and summarize the declarations of relevant member functions of the nvcompManager class hierarchy. For fully worked examples of the same, please view high_level_quickstart_example.cpp.

Manager Construction#

The user has two options for constructing an nvcompManager. In either case the user can specify a CUDA stream to use for all nvcompManager GPU operations. If a stream is not specified, the default stream will be used. In the following chapter default bitstream_kind = BitstreamKind::NVCOMP_NATIVE is assumed. You can read about other options in BitstreamKind.

1) Construction from an nvcomp-compressed buffer#

The user can construct a manager using a compressed buffer. This is the recommended way of constructing a manager for decompression, as it is less error-prone.

In order to use the create_manager factory, the user must include nvcomp/nvcompManagerFactory.hpp

cudaStream_t stream;
CUDA_CHECK(cudaStreamCreate(&stream));

std::shared_ptr<nvcompManagerBase> decomp_nvcomp_manager = create_manager(comp_buffer, stream);

A complete worked example using this approach is provided in the decomp_compressed_with_manager_factory_example within high_level_quickstart_example.cpp.

2) Direct construction#

In direct construction, the user must specify the parameters of the particular compressor they wish to use for compression or decompression. If manually specifying the manager for decompression, care must be taken to ensure that the configuration of the manager matches the configuration used to compress the buffers.

Chunk size is a common parameter that determines the size of the chunking internally. Some compressors may provide a higher compression ratio if given a larger chunk size. For example in LZ4, the larger the chunk size the larger the lookback window the algorithm can use to find matches.

Checksum support#

Upon manager construction, the user may also specify whether to store and/or verify checksums for the uncompressed and compressed buffers. The HLIF checksums are computed on the GPU using a modified CRC32 algorithm. It should be noted that these checksums are intended for error-detection purposes, not security. Also, enabling checksums may incur a sizeable performance penalty depending on the compression algorithm.

The fully worked examples comp_decomp_with_single_manager_with_checksums and decomp_compressed_with_manager_factory_with_checksums within high_level_quickstart_example.cpp demonstrate how to use the HLIF checksums.

cudaStream_t stream;
CUDA_CHECK(cudaStreamCreate(&stream));

const int chunk_size = 1 << 16;
nvcompType_t data_type = NVCOMP_TYPE_CHAR;

LZ4Manager nvcomp_manager{chunk_size, data_type, stream};

Compression#

Compression consists of two steps: Configuration then Compression.

Step 1 Configuration#

The configuration stage provides the maximum size of the compressed buffer. It also performs internal setup for the compression.

/**
 * @brief Configure the compression of a single buffer.
 *
 * This routine computes the size of the required result buffer. The result config also
 * contains the nvcompStatus* that allows error checking.
 *
 * @param uncomp_buffer_size The uncompressed input data size (in bytes).
 *
 * \return CompressionConfig for the size provided.
 */
virtual CompressionConfig configure_compression(
  const size_t uncomp_buffer_size) = 0;

Step 2 Compression#

Compression takes the result of configure_compression, a const input buffer and a result buffer. The result buffer should be allocated based on the result of configure_compression, which includes the maximum possible compressed size.

/**
 * @brief Perform compression asynchronously for a single buffer.
 *
 * @param uncomp_buffer The uncompressed input data.
 * (a pointer to device continuous memory).
 *
 * @param comp_buffer The location to output the compressed data to.
 * (a pointer to device continuous memory)
 * Size requirement is provided in CompressionConfig.
 *
 * @param comp_config Generated for the current uncomp_buffer with configure_compression.
 *
 * @param comp_size The location to output size in bytes after compression.
 * (a pointer to a single size_t variable on device)
 * Optional when bitstream kind is NVCOMP_NATIVE.
 */
virtual void compress(
  const uint8_t* uncomp_buffer,
  uint8_t* comp_buffer,
  const CompressionConfig& comp_config,
  size_t* comp_size = nullptr) = 0;

Decompression#

Decompression consists of two steps: Configuration then Decompression.

Step 1 Configuration#

To configure the decompression, the user has two options.

A) configure using a compressed buffer#

If when decompressing a compressed buffer the user doesn’t have the CompressionConfig used to compress the buffer, the user must use the configure API. This API synchronizes the stream provided at construction of the manager, because the decompression needs information that may only be accessible on the GPU.

/**
 * @brief Configure the decompression for a single buffer using a compressed buffer.
 *
 * Synchronizes the user stream.
 * - If bitstream kind is NVCOMP_NATIVE, it will parse the header in comp_buffer.
 * - If bitstream kind is RAW, it may be required (e.g for LZ4) to parse the whole comp_buffer,
 *   which could be significantly slower that other options.
 * - If bitstream kind is WITH_UNCOMPRESSED_SIZE, it will read the size from the beginning of the comp_buffer.
 *
 * @param comp_buffer The compressed input data.
 * (a pointer to device continuous memory)
 *
 * @param comp_size Size of the compressed input data. This is required only for RAW format.
 * (a pointer to device variable with compressed size)
 *
 * \return DecompressionConfig for the comp_buffer provided.
 */
virtual DecompressionConfig configure_decompression(
  const uint8_t* comp_buffer, const size_t* comp_size=nullptr) = 0;

B) configure using a compression config#

Sometimes, the user will retain the CompressionConfig object that was used to compress the buffer. In this case, the DecompressionConfig can be constructed from the CompressionConfig. Since the CompressionConfig resides in host memory, this configuration can happen without synchronizing the stream.

/**
 * @brief Configure the decompression for a single buffer using a CompressionConfig object.
 *
 * Does not synchronize the user stream.
 *
 * @param comp_config The config used to compress a buffer.
 *
 * \return DecompressionConfig based on compression config provided.
 */
virtual DecompressionConfig configure_decompression(
  const CompressionConfig& comp_config) = 0;

Step 2 Decompression#

Decompression utilizes a result decomp_buffer that should be provided by the user. The size of the decompressed buffer is provided by the previous configuration step.

/**
 * @brief Perform decompression asynchronously of a single buffer.
 *
 * @param decomp_buffer The location to output the decompressed data to.
 * (a pointer to device continuous memory)
 * Size requirement is provided in DecompressionConfig.
 *
 * @param comp_buffer The compressed input data.
 * (a pointer to device continuous memory)
 *
 * @param decomp_config Resulted from configure_decompression given this comp_buffer.
 * Contains nvcompStatus* in host/device-accessible memory to allow error checking.
 *
 * @param comp_size The size of compressed input data passed.
 * (a pointer to a single size_t variable on device)
 * Optional when bitstream kind is NVCOMP_NATIVE.
 */
virtual void decompress(
  uint8_t* decomp_buffer,
  const uint8_t* comp_buffer,
  const DecompressionConfig& decomp_config,
  size_t* comp_size = nullptr) = 0;

Batched API#

Managers also support batched compression and decompression of multiple buffers. Depending on a function, you will need to pass a std:vector or c-style array. If the function accepts c-style arrays you need to also pass the batch size.

virtual std::vector<CompressionConfig> configure_compression(
  const std::vector<size_t>& uncomp_buffer_sizes) = 0;

virtual std::vector<DecompressionConfig> configure_decompression(
  const uint8_t* const * comp_buffers, size_t batch_size, const size_t* comp_sizes = nullptr) = 0;

A complete worked example of batched compression and decompression can be found in multi_comp_decomp_batched within high_level_quickstart_example.cpp.

Bitstream kind#

BitstreamKind::RAW#

If you want to use the low-level C like API, but you don’t want to manage temporary buffers, you can pass BitstreamKind::RAW to manager constructor. In such case, chunk_size and checksum_policy arguments will be ignored. Manager will not split the input data into chunks, you will need to do it yourself to maintain good performance, and some of the algorithms may fail if the chunk sizes are too large. No nvCOMP header will be added, the functionality is interoperable with the low-level C API. As there is no nvCOMP header, you will need to pass comp_sizes to most of the manager functions, to store and read size of compressed chunks.

A complete worked example using this approach is provided in the multi_comp_decomp_raw within high_level_quickstart_example.cpp.

BitstreamKind::WITH_UNCOMPRESSED_SIZE#

Some algorithms, like LZ4, don’t store the size of input data inside their headers attached to compressed data. This value is required to perform decompression, so to get it, we may need to do virtual decompression, which will impact performance. We recommend the usage of default (BitstreamKind::NVCOMP_NATIVE) bitstream kind manager in such situation.

However, if you want to use the low-level C like API, then you can pass BitstreamKind::WITH_UNCOMPRESSED_SIZE to manager constructor. This manager will be similar to the one created with BitstreamKind::RAW, but will add a small header containing just the original size of the compressed buffer, which can speed up decompression. However, because a non-standard header is added, this manager is no longer interoperable with low-level C API.

HLIF Compression / Decompression Examples - LZ4#

high_level_quickstart_example.cpp provides worked examples of
  • Constructing the manager from arguments

  • Constructing the manager from a compressed buffer

  • Streamed compression and decompression of multiple buffers