New Years release, zlib-ng's 10 years celebration
My first commit in this repo was in October 8th 2014, although I do remember that I started during summer vacation and had made the zlib cleanup more of a mess than I wanted, so I restarted from scratch in October when I had gotten a better overview of the code and what I wanted to do to clean it up. At that point zlib-ng was not very likely to go anywhere, but despite the odds, over time several people found it and opened PRs with their own improvements, a few of those became long-time contributors, and a few years ago zlib-ng finally became more than an experimental fork. Zlib-ng has since gained traction and several distros have started replacing stock zlib with zlib-ng in compat mode.
The past year we have been lucky enough to have received donations so that we were able to invest in a couple Rpi5 systems for testing, and we are looking forward to hopefully being able to acquire more architectures for development and testing, Risc-V would be interesting for example, and we are lacking a dedicated testing machine capable of AVX512.
Release 2.2.3
This time we have two code fixes for potentially unsafe access, although we have not had any bug-reports about these.
It also contains several optimizations. Especially of note, inflate has been optimized on various instruction sets and also the generic C code has seen improvements, and we have improvements for arches where unaligned accesses are not possible (lacking instructions to handle unaligned access) and also improvements on big endian.
Example benchmarks:
x86-64 AVX2: Inflate ~17.8% faster, Deflate unchanged. -4.6KB library size.
Aarch64: Inflate ~2.3% faster, Deflate unchanged. - 5.5KB library size.
We also took some time to do a comprehensive cleanup of the now misleading UNALIGNED_OK option and of all the "unaligned" functions. We have noticed that some distros have been disabling these, fearing they are using potentially unsafe unaligned pointers, but we already fixed that in 2.1.0-beta1. Since then, these "unaligned" settings/functions have been referring to using unaligned accesses in safe ways, like utilizing unaligned intrinsics or memcpy to fix alignment for example and selecting what safe method is optimal to the arch. So disabling that instead disabled several safe optimizations.
Because this was obviously misleading certain distros into disabling these optimizations, we have cleaned it up, removed a lot of unnecessary preprocessor checks, and made detection of optimal methods happen during compile instead of configure. As a bonus, this cleaned up a lot of code and also let us not compile in many extra variants of compare256/longest_match, saving about 8-10KB of library size.
- PS: s390x is currently potentially unsafe, CI reports a failure on the MSAN test, this is pending investigation by IBM. See #1845.
- PPS: 32bit ARM windows release dlls failed to automatically compile due to Github Actions upgrading their build images, so unfortunately there are no binaries for that currently. This does not affect self-built binaries. See #1839.
Changes
Fixes for potentially unsafe access
Optimizations / Cleanups
- Allow the compiler to inline chunkcopy_safe more readily #1781
- Misc inflate cleanup#1797
- Reorder variables in inflate functions to reduce padding holes #1803
- Improve chunkset_avx2 performance #1778
- Simplify inflate fast by dispatching to chunkmemset for all chunkcopy cases #1802
- Make an AVX512 inflate fast with low cost masked writes #1805
- Enable AVX2 functions to be built with BMI2 instructions #1816
- Improve pipeling for AVX512 chunking #1821
- Risc-V: adler32_rvv: Fix two overflow problems #1826
- Remove UNALIGNED_OK checks #1828 #1834 #1835 #1830
- Use GCC's may_alias attribute for unaligned memory access #1548
Big Endian
- Make big endians first class citizens again #1831
- Fix "RLE" compression with big endian architectures #1832
Buildsys fixes / minor fixes
- Fix build on aarch64 android. #1783
- Allow overridde CMAKE_CXX_* variables and fix overridde CMAKE_C_* #1787
- Use target include instead of raw include #1784
- Replace non-ascii characters to fix MSVC warning #1791
- Force Visual C++ to treat source files as UTF-8. #1789
- Explicitly set CMake policy 0169 to silence warning #1792
- configure: Fix linker flags for Haiku. #1799
- configure: add --mandir to override $mandir on command line. #1800
- Force use of latest Windows SDK with 32-bit ARM support #1811
- Fix casting warning/error in test_compress_bound.cc #1814
- Remove unused HAVE_CHUNKMEMSET_1 define #1815
- Fix native detection of ARM CRC instruction #1818
- Address deprecated cmake version warning. #1812
- Add a fallback to ALIGNED_ macro for other compilers #1820
- added in-tree build artifacts to .gitignore #1823
- Fix typos #1825