Release notes#
nvcomp 4.2.0#
New features#
Added support for Blackwell HW Decompress Engine for Snappy, Gzip, and Deflate
Deflate and Gdeflate compression now supports chunk sizes larger than 64KB
Bug Fixes#
Fixed issue in ZSTD compression that resulted in “unspecified launch error” when presented with very small buffers
The HLIF previously did not raise exceptions in all failure cases
Known issues#
Cascaded, GDeflate, zStandard, Deflate, Gzip and Bitcomp decompressors can only operate on valid input data (data that was compressed using the same compressor). Other decompressors can sometimes detect errors in the compressed stream
Cascaded, zStandard and Bitcomp batched decompression C APIs cannot currently accept nullptr for actual_decompressed_bytes or device_statuses values. Deflate and Gzip cannot accept nullptr for device_statuses values
The Bitcomp low-level batched decompression function is not fully asynchronous
Gzip low-level interface only provides decompression
The device API only supports the LZ4/ANS format
Zstd decompression fails when decompressing buffers compressed with compression level 18 and higher using the zstd library version 1.5.6. To workaround the problem temporarily, you can provide 1.5x the scratch required by nvcompBatchedZstdDecompressGetTempSize to nvcompBatchedZstdDecompressAsync. Please file an nvBug.
nvCOMP C++ APIs on Linux can only be used with GCC >=9.x compilers
nvcomp 4.1.1#
Bug Fixes#
nvCOMP ZSTD compression exhibited failures / data corruption in the unlikely case where a ZSTD block contained only zero literals. Fixed by adding RLE literal support as required by the format.
Fixed bug in Deflate and Gzip uncompressed data size computation when non-compressed blocks (btype=00) were present in the deflate stream (compressed data).
nvcomp 4.1.0#
New features#
Fine-grained LLIF buffer alignment querying through
nvcompBatched<alg>CompressGetRequiredAlignments
andnvcompBatched<alg>DecompressRequiredAlignments
Enabled level 0 compression (Huffman only) for Deflate
Custom-allocator support in the Python interface through the
set_*_allocator
family of functions
Bug Fixes#
Fixed a memory leak in the Python interface
Fixed a bug in the Snappy decompressor that caused off-by-one token counts
Made GDeflate compression more RFC-1951-compliant by always producing headers with at most 286 literal-length codelengths in dynamic Huffman mode
Performance Optimizations#
Significant speedup in Bitcomp decompression using nvcomp HLIF – 7-8x for very small files (speedup observed on H100, A100 and L40 GPUs), and 1.3-1.5x for larger files on some GPUs (speedup observed on L40).
Known issues#
Cascaded, GDeflate, zStandard, Deflate, Gzip and Bitcomp decompressors can only operate on valid input data (data that was compressed using the same compressor). Other decompressors can sometimes detect errors in the compressed stream
Cascaded, zStandard and Bitcomp batched decompression C APIs cannot currently accept nullptr for actual_decompressed_bytes or device_statuses values. Deflate and Gzip cannot accept nullptr for device_statuses values
The Bitcomp low-level batched decompression function is not fully asynchronous
Gzip low-level interface only provides decompression
The device API only supports the LZ4/ANS format
nvcomp 4.0.1#
New Features#
Removed hard dependency of nvCOMP on the CUDA driver (libcuda.so on Linux and nvcuda64.dll on Windows) being present on the system
Python API now throws exceptions upon encountering CUDA Driver API problems
Added support for large internal element counts (INT_MAX+) in Deflate/GDeflate’s Optimal Parse
Bug Fixes#
Fixed a bug in Deflate/Gzip which caused occasional data corruption during decompression
Known issues#
Cascaded, GDeflate, zStandard, Deflate, Gzip and Bitcomp decompressors can only operate on valid input data (data that was compressed using the same compressor). Other decompressors can sometimes detect errors in the compressed stream
Cascaded, zStandard and Bitcomp batched decompression C APIs cannot currently accept nullptr for actual_decompressed_bytes or device_statuses values. Deflate and Gzip cannot accept nullptr for device_statuses values
The Bitcomp low-level batched decompression function is not fully asynchronous
Gzip low-level interface only provides decompression
The device API only supports the LZ4/ANS format
nvcomp 4.0.0#
New features#
Python API
Replaced spdlog by culiblogger and fmt for logging
Changed deflate/gdeflate compression modes, now support 0-5
Level 0: Huffman only, no LZ. Currently unsupported on Deflate.
Level 1: Default, same as 3.0
Level 2: Achieves compression ratios that exceed zlib level 1. Up to 27% better ratio than level 1
Level 3: Placeholder, equivalent to level 2
Level 4: achieves similar compression ratio to zlib level 6
Level 5: achieves similar compression ratio to zlib level 9
HLIF can now work on batches, not only on a single buffer
HLIF can now compress data without chunking and without nvcomp header (with option to store just uncompressed size)
Merged
libnvcomp*
shared library files into a singlelibnvcomp
fileShared library files have now major version in the name
Added LZ4 device-side API
Added “float16” mode to ANS for better ratios/performance with float16/bfloat16 data
Changed all low-level API function parameters to be named and documented more consistently
Updated many internal functions to use cuda::std::atomic values in place of volatile
ZSTD compression can now handle chunks up to (2GB - 1)
Bug Fixes#
Fixed a bug in the deflate decompressor which caused accuracy errors when copying uncompressed chunks
Fixed an HLIF encoding error when input data size is smaller than chunk size
Fixed a crash in cascaded compression for at least 2 delta passes and at least 1 RLE pass on highly compressible data
Fixed a runtime bug in LZ4 with multi-btye (e.g. int) data types
Fixed a runtime bug in Zstd which originated from a race condition during decoding
Fixed GPU buffer over-addressing problem in Bitcomp, LZ4, Snappy, and Zstd
Added HLIF constructors and functions without redundant
device_id
parameterFixed a case where the ANS HLIF was assuming device 0 for checking feature support
Fixed some cases where errors were logged to stdout, regardless of logging options
Performance Optimizations#
ZSTD Decompression up to 2x faster on T4, ~20% faster on H100 and others
Optimized Deflate/GDeflate Optimal Parse, up to ~10% faster on H100
Known issues#
Cascaded, GDeflate, zStandard, Deflate, Gzip and Bitcomp decompressors can only operate on valid input data (data that was compressed using the same compressor). Other decompressors can sometimes detect errors in the compressed stream
Cascaded, zStandard and Bitcomp batched decompression C APIs cannot currently accept nullptr for actual_decompressed_bytes or device_statuses values. Deflate and Gzip cannot accept nullptr for device_statuses values
The Bitcomp low-level batched decompression function is not fully asynchronous
Gzip low-level interface only provides decompression
The device API only supports the LZ4/ANS format
Deflate and GZip might corrupt data during decompression. For the time being, while using the low-level interface (LLIF), an external checksum or CRC verification is recommended, whereas the high-level interface (HLIF) can internally compute and verify checksums with the ComputeAndVerify checksum option.
nvcomp 3.0.6#
Bug Fixes#
Fixed a bug (introduced in 3.0.0) that resulted in ZSTD decompression errors.
nvcomp 3.0.5#
Bug Fixes#
Fixed a bug that caused compute-sanitizer memcheck failures in Snappy decompression.
nvcomp 3.0.4#
Bug Fixes#
Fixed a bug (introduced in 3.0.0) that caused incorrect snappy decompression in some cases.
Fixed a bug (introduced in 3.0.0) that caused incompatibility with CPU decompressors for ZSTD
nvcomp 3.0.3 (2023-10-06)#
Bug Fixes#
Fixed a bug (introduced in 3.0.0) that caused incorrect snappy decompression in some cases.
nvcomp 3.0.2 (2023-08-28)#
Bug Fixes#
Fixed a bug (introduced in 3.0.0) that caused incorrect snappy decompression in some cases.
nvcomp 3.0.1 (2023-08-08)#
Bug Fixes#
Remove unnecessary nvml dependency added in 3.0.0
nvcomp 3.0.0 (2023-07-03)#
New features#
Added
nvcomp*RequiredAlignment
constant variables for each compressorLow-level batched functions now return
nvcompErrorAlignment
if device buffers aren’t sufficiently alignedAdded HLIF for ZSTD, Deflate. Updated HLIF design such that HLIF now dispatches to LLIF.
Introduced device-side API. Currently limited to the ANS format
Added support for logging using
NVCOMP_LOG_LEVEL
(0-5) andNVCOMP_LOG_FILE
environment variables.
Performance Optimizations#
Optimize zSTD decompression. Up to 2.2x faster on H100 and 1.5x faster on A100
Optimize LZ4 decompression. Up to 1.4x faster on H100 and 1.4x faster on A100.
Optimize Snappy decompression. Up to 1.3x faster on H100 and 1.9x faster on A100.
Optimize Bitcomp decompression (standard algo). Up to 2x faster and more consistent accross datasets
Improve ZSTD compression ratio by up to 5% on 64 KB chunks, 30% on 512 KB chunks to closely match CPU L1 Compression.
nvcomp 2.6.1 (2023-02-03)#
Bug fixes#
Fixed a bug that caused non-deterministic decompression accuracy failures in ZSTD
Added support for Ada (sm89) GPUs
Fixed inconsistent compression stream format on some datasets when using GDeflate high-compression algorithm.
nvcomp 2.6.0 (2023-01-16)#
New features#
Added new nvcompBatched*CompressGetTempSizeEx API to allow less pessimistic scratch allocation requirement in many cases.
Further reduced zstd compression scratch requirement. For very large batches, in conjunction with the new extended API, the scratch allocation is now ~1.5x the total uncompressed size of the batch.
nvcomp 2.5.1 (2023-01-09)#
Bug fixes#
Improved GDeflate decompression throughput by up to 2x, fixing perf regression in 2.5.0
Fixed issue where some uses of CUB and Thrust in nvCOMP weren’t namespaced
Fixed bug, introduced in 2.5.0, in ZSTD decompression of large frames produced by the CPU compressor
nvcomp 2.5.0 (2022-12-16)#
New features#
Added Standard CRC32 support and its LLAPI.
Added Gzip batched decompresssion LL APIs, include getting decompression size APIs.
Added independent bitcomp.h header to access full feature set of bitcomp compressor
Added doc directory in nvcomp package containing the documentation files
Increased zStandard maximum compression chunk size from 64 KB to 16 MB
Improved zStandard decompression throughput by up to 2x on small batches and 40% on large batches
Added
nvcomp*CompressionMaxAllowedChunkSize
constant variables for each compressorUpdated GDeflate stream format to make it compatible with the GDeflate compression standard in NVIDIA RTX IO and Microsoft DirectStorage 1.1.
Updated GDeflate to support 64 KB dictionary window which allows a higher compression ratio.
Updated GDeflate CPU implementation to use the open source libdeflate repo: https://github.com/NVIDIA/libdeflate
Added initial support for SM90
Bug fixes#
Fixed memcheck failure in Snappy compression
Fixed deflate compression issue related to very small chunk sizes
Fixed handling of zero-byte chunks in ANS, Bitcomp, Cascaded, Deflate, and Gdeflate compressors
Fixed bug in Bitcomp where the maximum compressed size was slightly underestimated.
nvcomp 2.4.1 (2022-10-06)#
New features#
The Deflate batched decompression API can now accept nullptr for actual_decompressed_bytes.
Bug fixes#
Fixed incorrect behavior, failure, or crash when using duplicates feature (
-x <count>
) of the low-level “chunked” benchmarks.Updated deflate_cpu_compression example to use the correct APIs.
The Deflate batched decompression API can work on uncomprressed data chunk larger than 64KB.
Fixed correctness / stability issue in compute capability 6.1
nvcomp 2.4.0 (2022-09-23)#
New features#
Added support for ZSTD compression to LL API
Early Access Linux SBSA binaries.
Bug fixes#
Fixed issue where cascaded compressor bitpack wasn’t considering unsigned data type, causing suboptimal compression ratio
Fixed cmake problem where we stated wrong version compatibility
Performance Optimizations#
Optimized GDeflate high-compression mode. Up to 2x faster.
Optimized ZSTD decompression. Up to 1.2x faster.
Optimized Deflate decompression. Up to 1.5x faster.
Optimized ANS compression. Strong scaling allows for up to 7x higher compression and decompression throughput for files on the order of a few MB in size. Decompression throughput is improved by at least 20% on all tested files.
nvcomp 2.3.3 (2022-07-20)#
Bug Fixes#
Add missing nvcompBatchedDeflateDecompressGetTempSizeEx API
Fixed minor correctness issue in deflate compression.
Fixed cmake problem that caused an unnecessary implied cudart_static dependency
Performance Optimizations#
Optimized nvcompBatchedDeflateGetDecompressSizeAsync. Now 2-3x faster on A100.
nvcomp 2.3.2 (2022-06-24)#
Bug Fixes#
Fixed various bugs in ZSTD decompression implementation
Fixed the issue of deflate compression could not be correctly decompressed by zlib::inflate().
nvcomp 2.3.1 (2022-06-15)#
Bug Fixes#
Fixed various bugs in ZSTD decompression implementation
Fixed various bugs in ANS compression implementation
Fix hang in GDeflate high-compression mode for large files
Fix bug in library build that required dynamic link to cudart.
Interface Changes#
Added new API, nvcompBatched<Format>DecompressGetTempSizeEx(). This provides an optional capability for providing the total decompressed size to the API, which for some formats can dramatically reduce the required temp size.
nvcomp 2.3.0 (2022-04-29)#
New features#
Support ZSTD decompression in the LLIF
Deflate support (RFC 1951)
Modified-CRC32 checksum support added to HLIF. Includes optional verification of HLIF-compressed buffers intended for error detection
Bug fixes#
Added Pascal GPU architecture support for all compressors
Performance Optimizations#
Performance optimizations in ANS compression / decompression, leading to ~100% speedup in compression and ~50% speedup in decompression
Developed algorithmic improvements to GDeflate’s high-compression mode. This is now 30-40x faster on average while producing the same output as the previous version
Infrastructure#
Improvements to the benchmarking interface for LLIF – common argument APIs
nvcomp 2.2.0 (2022-02-07)#
New features#
Entropy-only mode for GDeflate
New high-level interface
Windows support
Support for GPU-accelerated ANS
Interface Changes#
High-level interface#
High-level interface is now standardized across compressor formats.
This interface provides a single nvcompManagerBase object that can do compression and decompression. Users can now decompress nvcomp-compressed files without knowing how they were compressed. The interface also can manage scratch space and splitting the input buffer into independent chunks for parallel processing.
API Consolidation#
nvCOMP now supports only the low-level batch API and the new high-level interface
nvcomp 2.1.0 (2021-10-28)#
New features#
New release of low-level batched API for Cascaded and Bitcomp methods.
New high-throughput and high-compression-ratio GPU compressors in GDeflate
Interface Changes#
Update batched/low-level compression interfaces to take an options parameter, to allow configuring future compression algorithms.
Update batched/low-level decompression interfaces to output the decompressed size (or 0 if an error occurs).
Add bounds checking to batched/low-level decompression routines, such that if an invalid compressed data stream is provided, 0 will be written for the output size, rather than generating an illegal memory access.
Fix LZ4 to support chunk sizes < 32 KB.
Performance Optimizations#
Improve performance of Snappy compression by ~10% in some configurations.
Add an optimization to the LZ4 compressor based on specification of input data as char, short, or int, rather than just treating the input as raw bytes.
Optimization to reduce the LZ hash table size when compressing smaller chunks.
Improved compression performance in GDeflate with the high-throughput option
Improved decompression performance in GDeflate (10-75% depending on the dataset)
Bug Fixes#
Fix LZ4 CPU compression example.
Fix temp allocation size bug in
benchmark_template_chunked
.
Infrastructure#
Update CMakeLists to compile nvcomp with -fPIC enabled.
Add a new script for benchmarking compression algorithms.
Add unit tests for the Snappy decompressor that tests decompression on legally formatted files that won’t be generated by the nvcomp compressor due to configuration.
Update CMakeLists to suppress warnings about missing nvcomp external dependencies when the user didn’t indicate they wanted to include them.
Update CMakeLists to allow install into include folder that the user does not have ownership of.
nvcomp 2.0.2 (2021-06-30)#
Add example
lz4_cpu_decompression
to compress on the GPU with nvCOMP and decompress on the CPU withliblz4
.Add CMake option for building a static library.
Fix bug in LZ4 compression kernel to comply with LZ4 end of block restrictions.
Fix temp allocation size bug in
benchmark_lz4_chunked
.
nvcomp 2.0.1 (2021-06-08)#
Improve CMake setup for using nvCOMP as a submodule. This includes marking dependencies as PRIVATE, and adding options for building examples, tests, and benchmarks (e.g.,
-DBUILD_EXAMPLES=ON
,-DBUILD_TESTS=ON
, and-DBUILD_BENCHMARKS=ON
).Fix double free error in
benchmark_snappy_synth
.Fix copy direction in Cascaded compression when the output size on the GPU.
Improve testing coverage.
Mark the generic decompression interfaces defined in
include/nvcomp.h
as deprecated.
nvcomp 2.0.0 (2021-04-28)#
Replace previous C, and C++ APIs.
Added Snappy compression (batched interface).
Added support for using Bitcomp and GDeflate external compressors.
Added
/examples
folder demonstrating use cases interface with CPU implementations of LZ4 and GDeflate, as well as GPU Direct Storage.Improve support for Windows in benchmark implementations.
Made usage of
std::uniform_int_distribution<>
in the benchmarks conform to the C++14 standard.Fix issue in Cascaded compression when using the default configuration (‘auto’), for small inputs.
nvcomp 1.2.3 (2021-04-07)#
Fix bug in LZ4 compression kernel for the Pascal architecture.
nvcomp 1.2.2 (2021-02-08)#
Fix linking errors in Clang++.
Fix error being incorrectly returned by Cascaded compression when output memory was initialized to all
-1
’s.Fix C++17 style static assert.
Fix prematurely freeing memory in Cascaded compression.
Fix input format and usage messaging for benchmarks.
nvcomp 1.2.1 (2020-12-21)#
Fix compile error and unit tests for cascaded selector.
nvcomp 1.2.0 (2020-12-19)#
Add the Cascaded Selector and Cascaded Auto set of interfaces for automatically configuring cascaded compression.
Generally improve error handling and messaging.
Update CMake configuration to support CCache.
nvcomp 1.1.1 (2020-12-02)#
Add all-gather benchmark.
Add sm80 target if CUDA version is 11 or greater.
nvcomp 1.1.0 (2020-10-05)#
Add batch C interface for LZ4, allowing compressing/decompressing multiple inputs at once.
Significantly improve performance of LZ4 compression.
nvcomp 1.0.2 (2020-08-12)#
Fix metadata freeing for LZ4, to avoid possible mismatch of
new[]
anddelete
.
nvcomp 1.0.1 (2020-08-07)#
Fixed naming of nvcompLZ4CompressX functions in
include/lz4.h
, to have thenvcomp
prefix.Changed CascadedMetadata::Header struct initialization to work around internal compiler error.
nvcomp 1.0.0 (2020-07-31)#
Initial public release.