- Fixed another bug (introduced in 3.0.0) that caused incorrect snappy decompression in some cases.
- Fixed a bug (introduced in 3.0.0) that caused incorrect snappy decompression in some cases.
- Remove unnecessary nvml dependency added in 3.0.0
- Added
nvcomp*RequiredAlignment
constant variables for each compressor - Low-level batched functions now return
nvcompErrorAlignment
if device buffers aren't sufficiently aligned - Added HLIF for ZSTD, Deflate. Updated HLIF design such that HLIF now dispatches to LLIF.
- Introduced device-side API. Currently limited to the ANS format
- Added support for logging using
NVCOMP_LOG_LEVEL
(0-5) andNVCOMP_LOG_FILE
environment variables.
- Optimize zSTD decompression. Up to 2.2x faster on H100 and 1.5x faster on A100
- Optimize LZ4 decompression. Up to 1.4x faster on H100 and 1.4x faster on A100.
- Optimize Snappy decompression. Up to 1.3x faster on H100 and 1.9x faster on A100.
- Optimize Bitcomp decompression (standard algo). Up to 2x faster and more consistent accross datasets
- Improve ZSTD compression ratio by up to 5% on 64 KB chunks, 30% on 512 KB chunks to closely match CPU L1 Compression.
- Fixed a bug that caused non-deterministic decompression accuracy failures in ZSTD
- Added support for Ada (sm89) GPUs
- Fixed inconsistent compression stream format on some datasets when using GDeflate high-compression algorithm.
- Added new nvcompBatched*CompressGetTempSizeEx API to allow less pessimistic scratch allocation requirement in many cases.
- Further reduced zstd compression scratch requirement. For very large batches, in conjunction with the new extended API, the scratch allocation is now ~1.5x the total uncompressed size of the batch.
- Improved GDeflate decompression throughput by up to 2x, fixing perf regression in 2.5.0
- Fixed issue where some uses of CUB and Thrust in nvCOMP weren't namespaced
- Fixed bug, introduced in 2.5.0, in ZSTD decompression of large frames produced by the CPU compressor
- Added Standard CRC32 support and its LLAPI.
- Added Gzip batched decompresssion LL APIs, include getting decompression size APIs.
- Added independent bitcomp.h header to access full feature set of bitcomp compressor
- Added doc directory in nvcomp package containing the documentation files
- Increased zStandard maximum compression chunk size from 64 KB to 16 MB
- Improved zStandard decompression throughput by up to 2x on small batches and 40% on large batches
- Added
nvcomp*CompressionMaxAllowedChunkSize
constant variables for each compressor - Updated GDeflate stream format to make it compatible with the GDeflate compression standard in NVIDIA RTX IO and Microsoft DirectStorage 1.1.
- Updated GDeflate to support 64 KB dictionary window which allows a higher compression ratio.
- Updated GDeflate CPU implementation to use the open source libdeflate repo: https://github.com/NVIDIA/libdeflate
- Added initial support for SM90
- Fixed memcheck failure in Snappy compression
- Fixed deflate compression issue related to very small chunk sizes
- Fixed handling of zero-byte chunks in ANS, Bitcomp, Cascaded, Deflate, and Gdeflate compressors
- Fixed bug in Bitcomp where the maximum compressed size was slightly underestimated.
- The Deflate batched decompression API can now accept nullptr for actual_decompressed_bytes.
- Fixed incorrect behavior, failure, or crash when using duplicates feature (
-x <count>
) of the low-level "chunked" benchmarks. - Updated deflate_cpu_compression example to use the correct APIs.
- The Deflate batched decompression API can work on uncomprressed data chunk larger than 64KB.
- Fixed correctness / stability issue in compute capability 6.1
- Added support for ZSTD compression to LL API
- Early Access Linux SBSA binaries.
- Fixed issue where cascaded compressor bitpack wasn't considering unsigned data type, causing suboptimal compression ratio
- Fixed cmake problem where we stated wrong version compatibility
- Optimized GDeflate high-compression mode. Up to 2x faster.
- Optimized ZSTD decompression. Up to 1.2x faster.
- Optimized Deflate decompression. Up to 1.5x faster.
- Optimized ANS compression. Strong scaling allows for up to 7x higher compression and decompression throughput for files on the order of a few MB in size. Decompression throughput is improved by at least 20% on all tested files.
- Add missing nvcompBatchedDeflateDecompressGetTempSizeEx API
- Fixed minor correctness issue in deflate compression.
- Fixed cmake problem that caused an unnecessary implied cudart_static dependency
- Optimized nvcompBatchedDeflateGetDecompressSizeAsync. Now 2-3x faster on A100.
- Fixed various bugs in ZSTD decompression implementation
- Fixed the issue of deflate compression could not be correctly decompressed by zlib::inflate().
- Fixed various bugs in ZSTD decompression implementation
- Fixed various bugs in ANS compression implementation
- Fix hang in GDeflate high-compression mode for large files
- Fix bug in library build that required dynamic link to cudart.
- Added new API, nvcompBatched<Format>DecompressGetTempSizeEx(). This provides an optional capability for providing the total decompressed size to the API, which for some formats can dramatically reduce the required temp size.
- Support ZSTD decompression in the LLIF
- Deflate support (RFC 1951)
- Modified-CRC32 checksum support added to HLIF. Includes optional verification of HLIF-compressed buffers intended for error detection
- Added Pascal GPU architecture support for all compressors
- Performance optimizations in ANS compression / decompression, leading to ~100% speedup in compression and ~50% speedup in decompression
- Developed algorithmic improvements to GDeflate's high-compression mode. This is now 30-40x faster on average while producing the same output as the previous version
- Improvements to the benchmarking interface for LLIF -- common argument APIs
- Entropy-only mode for GDeflate
- New high-level interface
- Windows support
- Support for GPU-accelerated ANS
- High level interface is now standardized across compressor formats.
- This interface provides a single nvcompManagerBase object that can do compression and decompression. Users can now decompress nvcomp-compressed files without knowing how they were compressed. The interface also can manage scratch space and splitting the input buffer into independent chunks for parallel processing.
- nvCOMP now supports only the low-level batch API and the new high level interface
- New release of low-level batched API for Cascaded and Bitcomp methods.
- New high-throughput and high-compression-ratio GPU compressors in GDeflate
- Update batched/low-level compression interfaces to take an options parameter, to allow configuring future compression algorithms.
- Update batched/low-level decompression interfaces to output the decompressed size (or 0 if an error occurs).
- Add bounds checking to batched/low-level decompression routines, such that if an invalid compressed data stream is provided, 0 will be written for the output size, rather than generating an illegal memory access.
- Fix LZ4 to support chunk sizes < 32 KB.
- Improve performance of Snappy compression by ~10% in some configurations.
- Add an optimization to the LZ4 compressor based on specification of input data as char, short, or int, rather than just treating the input as raw bytes.
- Optimization to reduce the LZ hash table size when compressing smaller chunks.
- Improved compression performance in GDeflate with the high-throughput option
- Improved decompression performance in GDeflate (10-75% depending on the dataset)
- Fix LZ4 CPU compression example.
- Fix temp allocation size bug in
benchmark_template_chunked
.
- Update CMakeLists to compile nvcomp with -fPIC enabled.
- Add a new script for benchmarking compression algorithms.
- Add unit tests for the Snappy decompressor that tests decompression on legally formatted files that won't be generated by the nvcomp compressor due to configuration.
- Update CMakeLists to suppress warnings about missing nvcomp external dependencies when the user didn't indicate they wanted to include them.
- Update CMakeLists to allow install into include folder that the user does not have ownership of.
- Add example
lz4_cpu_decompression
to compress on the GPU with nvCOMP and decompress on the CPU withliblz4
. - Add CMake option for building a static library.
- Fix bug in LZ4 compression kernel to comply with LZ4 end of block restrictions.
- Fix temp allocation size bug in
benchmark_lz4_chunked
.
- Improve CMake setup for using nvCOMP as a submodule. This includes marking
dependencies as PRIVATE, and adding options for building examples, tests, and
benchmarks (e.g.,
-DBUILD_EXAMPLES=ON
,-DBUILD_TESTS=ON
, and-DBUILD_BENCHMARKS=ON
). - Fix double free error in
benchmark_snappy_synth
. - Fix copy direction in Cascaded compression when the output size on the GPU.
- Improve testing coverage.
- Mark the generic decompression interfaces defined in
include/nvcomp.h
as deprecated.
- Replace previous C, and C++ APIs.
- Added Snappy compression (batched interface).
- Added support for using Bitcomp and GDeflate external compressors.
- Added
/examples
folder demonstrating use cases interface with CPU implementations of LZ4 and GDeflate, as well as GPU Direct Storage. - Improve support for Windows in benchmark implementations.
- Made usage of
std::uniform_int_distribution<>
in the benchmarks conform to the C++14 standard. - Fix issue in Cascaded compression when using the default configuration ('auto'), for small inputs.
- Fix bug in LZ4 compression kernel for the Pascal architecture.
- Fix linking errors in Clang++.
- Fix error being incorrectly returned by Cascaded compression when output memory was initialized to
all
-1
's. - Fix C++17 style static assert.
- Fix prematurely freeing memory in Cascaded compression.
- Fix input format and usage messaging for benchmarks.
- Fix compile error and unit tests for cascaded selector.
- Add the Cascaded Selector and Cascaded Auto set of interfaces for automatically configuring cascaded compression.
- Generally improve error handling and messaging.
- Update CMake configuration to support CCache.
- Add all-gather benchmark.
- Add sm80 target if CUDA version is 11 or greater.
- Add batch C interface for LZ4, allowing compressing/decompressing multiple inputs at once.
- Significantly improve performance of LZ4 compression.
- Fix metadata freeing for LZ4, to avoid possible mismatch of
new[]
anddelete
.
- Fixed naming of nvcompLZ4CompressX functions in
include/lz4.h
, to have thenvcomp
prefix. - Changed CascadedMetadata::Header struct initialization to work around internal compiler error.
- Initial public release.